D. Gautheret et A. Lambert, Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles, J MOL BIOL, 313(5), 2001, pp. 1003-1011
We present here a new approach to the problem of defining RNA signatures an
d finding their occurrences in sequence databases. TI-le proposed method is
based on "secondary structure profiles". An RNA sequence alignment with se
condary structure information is used as an input. Two types of weight matr
ices/profiles are constructed from this alignment: single strands are repre
sented by a classical lod-scores profile while helical regions are represen
ted by an extended "helical profile" comprising 16 lod-scores per position,
one for each of the 16 possible base-pairs. Database searches are then con
ducted using a simultaneous search for helical profiles and dynamic program
ming alignment of single strand profiles. The algorithm has been implemente
d into a new software, ERPIN, that performs both profile construction and d
atabase search. Applications are presented for several RNA motifs. The auto
mated use of sequence information in both single-stranded and helical regio
ns yields better sensitivity/specificity ratios than descriptor-based progr
ams. Furthermore, since the translation of alignments into profiles is stra
ightforward with ERPIN, iterative searches can easily be conducted to enric
h collections of homologous RNAs. (C) 2001 Academic Press.