ITA
ENG

LOW-MOLECULAR-WEIGHT PROTEINS - A CHALLENGE FOR POST-GENOMIC RESEARCH

Authors

RUDD KE HUMPHERYSMITH I WASINGER VC BAIROCH A

Citation

Ke. Rudd et al., LOW-MOLECULAR-WEIGHT PROTEINS - A CHALLENGE FOR POST-GENOMIC RESEARCH, Electrophoresis, 19(4), 1998, pp. 536-544

Citations number

Categorie Soggetti

Biochemical Research Methods","Chemistry Analytical

Journal title

Electrophoresis → ACNP

ISSN journal

01730835

Volume

Issue

Year of publication

1998

Pages

536 - 544

Database

ISI

SICI code

0173-0835(1998)19:4<536:LP-ACF>2.0.ZU;2-4

Abstract

The EcoGene project involves the examination of Escherichia coli K-12 DNA sequences and accompanying annotation in the public databases in o rder to refine the representation and prediction of the entire set of E. coli K-12 chromosomally encoded protein sequences. The results of t his ongoing effort have been deposited in the SWISSPROT protein sequen ce database as sequencing of the E. coli genome has progressed to comp letion in recent years. Through this continuing research, we have disc overed that the prediction of low molecular weight (small) proteins, a rbitrarily defined as protein sequences less than or equal to 150 amin o acids (aa) in length, is problematic and requires special attention. We describe the small protein subset of EcoGene and the approach used to derive this subset from the complete E. coli genome sequence and d atabase annotations. These E. coli proteins have helped to identify ne w small genes in other organisms and to identify conserved residues (m otifs) using database searches and multiple alignments. Two thirds of the E. coli small proteins have not been characterized experimentally. The careful application of computer and laboratory methods to the ana lysis of small proteins is needed for accurate prediction, verificatio n and characterization. The problem of accurate protein sequence ident ification is not limited to small proteins or to E. coli; these proble ms are encountered to varying degrees throughout all sequence database s.