STATISTICS OF RNA SECONDARY STRUCTURES

Citation
W. Fontana et al., STATISTICS OF RNA SECONDARY STRUCTURES, Biopolymers, 33(9), 1993, pp. 1389-1404
Citations number
44
Categorie Soggetti
Biology
Journal title
ISSN journal
00063525
Volume
33
Issue
9
Year of publication
1993
Pages
1389 - 1404
Database
ISI
SICI code
0006-3525(1993)33:9<1389:SORSS>2.0.ZU;2-E
Abstract
A statistical reference for RNA secondary structures with minimum free energies is computed by folding large ensembles of random RNA sequenc es. Four nucleotide alphabets are used: two binary alphabets, AU and G C, the biophysical AUGC and the synthetic GCXK alphabet. RNA secondary structures are made of structural elements, such as stacks, loops, jo ints, and free ends. Statistical properties of these elements are comp uted for small RNA molecules of chain lengths up to 100. The results o f RNA structure statistics depend strongly on the particular alphabet chosen. The statistical reference is compared with the data derived fr om natural RNA molecules with similar base frequencies. Secondary stru ctures are represented as trees. Tree editing provides a quantitative measure for the distance d(t), between two structures. We compute a st ructure density surface as the conditional probability of two structur es having distance t given that their sequences have distance h. This surface indicates that the vast majority of possible minimum free ener gy secondary structures occur within a fairly small neighborhood of an y typical (random) sequence. Correlation lengths for secondary structu res in their tree representations are computed from probability densit ies. They are appropriate measures for the complexity of the sequence- structure relation. The correlation length also provides a quantitativ e estimate for the mean sensitivity of structures to point mutations. (C) 1993 John Wiley & Sons, Inc.