Some genes produce noncoding transcripts that function directly as structur
al, regulatory, or even catalytic RNAs [1, 2]. Unlike protein-coding genes,
which can be detected as open reading frames with distinctive statistical
biases, noncoding RNA (ncRNA) gene sequences have no obvious inherent stati
stical biases [3]. Thus, genome sequence analyses reveal novel protein-codi
ng genes, but any novel ncRNA genes remain invisible. Here, we describe a c
omputational comparative genomic screen for ncRNA genes. The key idea is to
distinguish conserved RNA secondary structures from a background of other
conserved sequences using probabilistic models of expected mutational patte
rns in pairwise sequence alignments. We report the first whole-genome scree
n for ncRNA genes done with this method, in which we applied it to the "int
ergenic" spacers of Escherichia coli using comparative sequence data from f
our related bacteria. Starting from > 23,000 conserved interspecies pairwis
e alignments, the screen predicted 275 candidate structural RNA loci. A sam
ple of 49 candidate loci was assayed experimentally. At least 11 loci expre
ssed small, apparently noncoding RNA transcripts of unknown function. Our c
omputational approach may be used to discover structural ncRNA genes in any
genome for which appropriate comparative genome sequence data are availabl
e.