Computational identification of noncoding RNAs in E-coli by comparative genomics

Citation
E. Rivas et al., Computational identification of noncoding RNAs in E-coli by comparative genomics, CURR BIOL, 11(17), 2001, pp. 1369-1373
Citations number
35
Categorie Soggetti
Experimental Biology
Journal title
CURRENT BIOLOGY
ISSN journal
09609822 → ACNP
Volume
11
Issue
17
Year of publication
2001
Pages
1369 - 1373
Database
ISI
SICI code
0960-9822(20010904)11:17<1369:CIONRI>2.0.ZU;2-E
Abstract
Some genes produce noncoding transcripts that function directly as structur al, regulatory, or even catalytic RNAs [1, 2]. Unlike protein-coding genes, which can be detected as open reading frames with distinctive statistical biases, noncoding RNA (ncRNA) gene sequences have no obvious inherent stati stical biases [3]. Thus, genome sequence analyses reveal novel protein-codi ng genes, but any novel ncRNA genes remain invisible. Here, we describe a c omputational comparative genomic screen for ncRNA genes. The key idea is to distinguish conserved RNA secondary structures from a background of other conserved sequences using probabilistic models of expected mutational patte rns in pairwise sequence alignments. We report the first whole-genome scree n for ncRNA genes done with this method, in which we applied it to the "int ergenic" spacers of Escherichia coli using comparative sequence data from f our related bacteria. Starting from > 23,000 conserved interspecies pairwis e alignments, the screen predicted 275 candidate structural RNA loci. A sam ple of 49 candidate loci was assayed experimentally. At least 11 loci expre ssed small, apparently noncoding RNA transcripts of unknown function. Our c omputational approach may be used to discover structural ncRNA genes in any genome for which appropriate comparative genome sequence data are availabl e.