Both ribosomal DNA (rDNA) and ribosomal RNA (rRNA) are over-represente
d in the starting material for genomic and cDNA libraries; thus, their
sequences have the potential of repeatedly entering the various datab
ases. When DNA (both transcribed and intergenic spacer regions) is use
d as query sequence, a great number of matches are found in the databa
ses, particularly in the EST database, and to a lesser extent among ge
nomic sequences and STSs, which are not identified as rDNA. We discuss
the following explanations for the widespread occurrence of rDNA in c
DNA and genomic DNA libraries: pseudogenes of rRNA in other genomic lo
cations, mRNA-derived pseudogenes that reside in rDNA, cDNAs derived f
rom rRNA [either by self-priming or by internal oligo(dT) priming], cD
NAs derived from actual transcripts of the rDNA intergenic spacer, and
genomic DNA contamination of RNA preparations. Because so many databa
se entries contain unidentified rDNA, we recommend that all sequence s
ubmissions be checked (by the submitters) for the presence of structur
al RNAs in addition to repetitive sequences.