K. Frech et al., COMPUTER-ASSISTED PREDICTION, CLASSIFICATION, AND DELIMITATION OF PROTEIN-BINDING SITES IN NUCLEIC-ACIDS, Nucleic acids research, 21(7), 1993, pp. 1655-1664
We present a method to determine the location and extent of protein bi
nding regions in nucleic acids by computer-assisted analysis of sequen
ce data. The program ConsIndex establishes a library of consensus desc
riptions based on sequence sets containing known regulatory elements.
These defined consensus descriptions are used by the program ConsInspe
ctor to predict binding sites in new sequences. We show the programs t
o correctly determine the significant regions involved in transcriptio
nal control of seven sequence elements. The internal profile of relati
ve variability of individual nucleotide positions within these regions
paralleled experimental profiles of biological significance. Consensu
s descriptions are determined by employing an anchored alignment schem
e, the results of which are then evaluated by a novel method which is
superior to cluster algorithms. The alignment procedure is able to inc
lude several closely related sequences without biasing the consensus d
escription. Moreover, the algorithm detects additional elements on the
basis of a moderate distance correlation and is capable of discrimina
ting between real binding sites and false positive matches. The softwa
re is well suited to cope with the frequent phenomenon of optional ele
ments present in a subset of functionally similar sequences, while tak
ing maximal advantage of the existing sequence data base. Since it req
uires only a minimum of seven sequences for a single element, it is ap
plicable to a wide range of binding sites.