A statistical reference for RNA secondary structures with minimum free
energies is computed by folding large ensembles of random RNA sequenc
es. Four nucleotide alphabets are used: two binary alphabets, AU and G
C, the biophysical AUGC and the synthetic GCXK alphabet. RNA secondary
structures are made of structural elements, such as stacks, loops, jo
ints, and free ends. Statistical properties of these elements are comp
uted for small RNA molecules of chain lengths up to 100. The results o
f RNA structure statistics depend strongly on the particular alphabet
chosen. The statistical reference is compared with the data derived fr
om natural RNA molecules with similar base frequencies. Secondary stru
ctures are represented as trees. Tree editing provides a quantitative
measure for the distance d(t), between two structures. We compute a st
ructure density surface as the conditional probability of two structur
es having distance t given that their sequences have distance h. This
surface indicates that the vast majority of possible minimum free ener
gy secondary structures occur within a fairly small neighborhood of an
y typical (random) sequence. Correlation lengths for secondary structu
res in their tree representations are computed from probability densit
ies. They are appropriate measures for the complexity of the sequence-
structure relation. The correlation length also provides a quantitativ
e estimate for the mean sensitivity of structures to point mutations.
(C) 1993 John Wiley & Sons, Inc.