Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles

Citation
D. Gautheret et A. Lambert, Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles, J MOL BIOL, 313(5), 2001, pp. 1003-1011
Citations number
24
Categorie Soggetti
Molecular Biology & Genetics
Journal title
JOURNAL OF MOLECULAR BIOLOGY
ISSN journal
00222836 → ACNP
Volume
313
Issue
5
Year of publication
2001
Pages
1003 - 1011
Database
ISI
SICI code
0022-2836(20011109)313:5<1003:DRMDAI>2.0.ZU;2-P
Abstract
We present here a new approach to the problem of defining RNA signatures an d finding their occurrences in sequence databases. TI-le proposed method is based on "secondary structure profiles". An RNA sequence alignment with se condary structure information is used as an input. Two types of weight matr ices/profiles are constructed from this alignment: single strands are repre sented by a classical lod-scores profile while helical regions are represen ted by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then con ducted using a simultaneous search for helical profiles and dynamic program ming alignment of single strand profiles. The algorithm has been implemente d into a new software, ERPIN, that performs both profile construction and d atabase search. Applications are presented for several RNA motifs. The auto mated use of sequence information in both single-stranded and helical regio ns yields better sensitivity/specificity ratios than descriptor-based progr ams. Furthermore, since the translation of alignments into profiles is stra ightforward with ERPIN, iterative searches can easily be conducted to enric h collections of homologous RNAs. (C) 2001 Academic Press.