A computational approach to identify genes for functional RNAs in genomic sequences

Citation
Rj. Carter et al., A computational approach to identify genes for functional RNAs in genomic sequences, NUCL ACID R, 29(19), 2001, pp. 3928-3938
Citations number
45
Categorie Soggetti
Biochemistry & Biophysics
Journal title
NUCLEIC ACIDS RESEARCH
ISSN journal
03051048 → ACNP
Volume
29
Issue
19
Year of publication
2001
Pages
3928 - 3938
Database
ISI
SICI code
0305-1048(20011001)29:19<3928:ACATIG>2.0.ZU;2-J
Abstract
Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We ha ve developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal gen omes. The Escherichia coli genome was used for development, but we have app lied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80-90% accurate in jackknife testing e xperiments for bacteria and 90-99% for hyperthermophilic archaea. We also a chieved a significant improvement in accuracy by combining these prediction s with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several know n fRNAs not included in the training datasets were identified as well as se veral hundred predicted novel RNAs. These studies indicate that there are m any unidentified RNAs in simple genomes that can be predicted computational ly as a precursor to experimental study. Public access to our RNA gene pred ictions and an interface for user predictions is available via the web.