DETECTION OF GENES IN ESCHERICHIA-COLI SEQUENCES DETERMINED BY GENOMEPROJECTS AND PREDICTION OF PROTEIN-PRODUCTION LEVELS, BASED ON MULTIVARIATE DIVERSITY IN CODON USAGE
S. Kanaya et al., DETECTION OF GENES IN ESCHERICHIA-COLI SEQUENCES DETERMINED BY GENOMEPROJECTS AND PREDICTION OF PROTEIN-PRODUCTION LEVELS, BASED ON MULTIVARIATE DIVERSITY IN CODON USAGE, Computer applications in the biosciences, 12(3), 1996, pp. 213-225
We used principal component analysis to develop measures (called Z-par
ameters in this study) which reflect the diversity of codon usage in E
scherichia coli genes. Protein production levels for 1500 CDSs (protei
n-coding sequences) identified by E.coli genome projects in Japan and
the US were estimated from a correlation equation between Z(1) and cel
lular protein content obtained through analysis of the genes experimen
tally characterized. Through the profile analysis of Z(1) for E.coli s
equences obtained by;he Japanese Project, we predicted an additional 3
6 CDSs that had not been annotated in the International DNA Database.
Thirtyone out of the 36 CDSs could be assigned to presumptive protein
genes through a BLASTX search for recent protein databases in the Geno
me Net in Japan. Detailed examination of the Z(1)-parameter profile le
d us to assess sequencing errors which cause frame-shift.