DETECTION OF GENES IN ESCHERICHIA-COLI SEQUENCES DETERMINED BY GENOMEPROJECTS AND PREDICTION OF PROTEIN-PRODUCTION LEVELS, BASED ON MULTIVARIATE DIVERSITY IN CODON USAGE

Citation
S. Kanaya et al., DETECTION OF GENES IN ESCHERICHIA-COLI SEQUENCES DETERMINED BY GENOMEPROJECTS AND PREDICTION OF PROTEIN-PRODUCTION LEVELS, BASED ON MULTIVARIATE DIVERSITY IN CODON USAGE, Computer applications in the biosciences, 12(3), 1996, pp. 213-225
Citations number
32
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications","Biology Miscellaneous
ISSN journal
02667061
Volume
12
Issue
3
Year of publication
1996
Pages
213 - 225
Database
ISI
SICI code
0266-7061(1996)12:3<213:DOGIES>2.0.ZU;2-4
Abstract
We used principal component analysis to develop measures (called Z-par ameters in this study) which reflect the diversity of codon usage in E scherichia coli genes. Protein production levels for 1500 CDSs (protei n-coding sequences) identified by E.coli genome projects in Japan and the US were estimated from a correlation equation between Z(1) and cel lular protein content obtained through analysis of the genes experimen tally characterized. Through the profile analysis of Z(1) for E.coli s equences obtained by;he Japanese Project, we predicted an additional 3 6 CDSs that had not been annotated in the International DNA Database. Thirtyone out of the 36 CDSs could be assigned to presumptive protein genes through a BLASTX search for recent protein databases in the Geno me Net in Japan. Detailed examination of the Z(1)-parameter profile le d us to assess sequencing errors which cause frame-shift.