LOW-MOLECULAR-WEIGHT PROTEINS - A CHALLENGE FOR POST-GENOMIC RESEARCH

Citation
Ke. Rudd et al., LOW-MOLECULAR-WEIGHT PROTEINS - A CHALLENGE FOR POST-GENOMIC RESEARCH, Electrophoresis, 19(4), 1998, pp. 536-544
Citations number
30
Categorie Soggetti
Biochemical Research Methods","Chemistry Analytical
Journal title
ISSN journal
01730835
Volume
19
Issue
4
Year of publication
1998
Pages
536 - 544
Database
ISI
SICI code
0173-0835(1998)19:4<536:LP-ACF>2.0.ZU;2-4
Abstract
The EcoGene project involves the examination of Escherichia coli K-12 DNA sequences and accompanying annotation in the public databases in o rder to refine the representation and prediction of the entire set of E. coli K-12 chromosomally encoded protein sequences. The results of t his ongoing effort have been deposited in the SWISSPROT protein sequen ce database as sequencing of the E. coli genome has progressed to comp letion in recent years. Through this continuing research, we have disc overed that the prediction of low molecular weight (small) proteins, a rbitrarily defined as protein sequences less than or equal to 150 amin o acids (aa) in length, is problematic and requires special attention. We describe the small protein subset of EcoGene and the approach used to derive this subset from the complete E. coli genome sequence and d atabase annotations. These E. coli proteins have helped to identify ne w small genes in other organisms and to identify conserved residues (m otifs) using database searches and multiple alignments. Two thirds of the E. coli small proteins have not been characterized experimentally. The careful application of computer and laboratory methods to the ana lysis of small proteins is needed for accurate prediction, verificatio n and characterization. The problem of accurate protein sequence ident ification is not limited to small proteins or to E. coli; these proble ms are encountered to varying degrees throughout all sequence database s.