Currently there is no successful computational approach for identification
of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We ha
ve developed a machine learning approach using neural networks and support
vector machines to extract common features among known RNAs for prediction
of new RNA genes in the unannotated regions of prokaryotic and archaeal gen
omes. The Escherichia coli genome was used for development, but we have app
lied this method to several other bacterial and archaeal genomes. Networks
based on nucleotide composition were 80-90% accurate in jackknife testing e
xperiments for bacteria and 90-99% for hyperthermophilic archaea. We also a
chieved a significant improvement in accuracy by combining these prediction
s with those obtained using a second set of parameters consisting of known
RNA sequence motifs and the calculated free energy of folding. Several know
n fRNAs not included in the training datasets were identified as well as se
veral hundred predicted novel RNAs. These studies indicate that there are m
any unidentified RNAs in simple genomes that can be predicted computational
ly as a precursor to experimental study. Public access to our RNA gene pred
ictions and an interface for user predictions is available via the web.