COMPUTER-ASSISTED PREDICTION, CLASSIFICATION, AND DELIMITATION OF PROTEIN-BINDING SITES IN NUCLEIC-ACIDS

Citation
K. Frech et al., COMPUTER-ASSISTED PREDICTION, CLASSIFICATION, AND DELIMITATION OF PROTEIN-BINDING SITES IN NUCLEIC-ACIDS, Nucleic acids research, 21(7), 1993, pp. 1655-1664
Citations number
38
Journal title
ISSN journal
03051048
Volume
21
Issue
7
Year of publication
1993
Pages
1655 - 1664
Database
ISI
SICI code
0305-1048(1993)21:7<1655:CPCADO>2.0.ZU;2-B
Abstract
We present a method to determine the location and extent of protein bi nding regions in nucleic acids by computer-assisted analysis of sequen ce data. The program ConsIndex establishes a library of consensus desc riptions based on sequence sets containing known regulatory elements. These defined consensus descriptions are used by the program ConsInspe ctor to predict binding sites in new sequences. We show the programs t o correctly determine the significant regions involved in transcriptio nal control of seven sequence elements. The internal profile of relati ve variability of individual nucleotide positions within these regions paralleled experimental profiles of biological significance. Consensu s descriptions are determined by employing an anchored alignment schem e, the results of which are then evaluated by a novel method which is superior to cluster algorithms. The alignment procedure is able to inc lude several closely related sequences without biasing the consensus d escription. Moreover, the algorithm detects additional elements on the basis of a moderate distance correlation and is capable of discrimina ting between real binding sites and false positive matches. The softwa re is well suited to cope with the frequent phenomenon of optional ele ments present in a subset of functionally similar sequences, while tak ing maximal advantage of the existing sequence data base. Since it req uires only a minimum of seven sequences for a single element, it is ap plicable to a wide range of binding sites.