Mc. Macleod et al., THE PROBABILITY OF OCCURRENCE OF OLIGOMER MOTIFS IN THE HUMAN GENOME AND GENOMIC MICROHETEROGENEITY, Journal of theoretical biology, 181(4), 1996, pp. 311-318
A previously published method for predicting the frequency of random o
ccurrence of a completely specified DNA oligomer in a longer sequence
dataset has been generalized to allow degeneracy in the oligomer seque
nce. With this enhancement, several datasets consisting of sequences f
rom the human genome were searched for the occurrence of consensus bin
ding sites for a set of 13 transcription factors. Although because of
the biological significance of these sequences one might predict that
they would occur more often than the random frequency, many of the con
sensus oligomers were found at lower than expected frequencies. Severa
l (G+C)-rich oligomers were found to be moderately over-represented, b
ut this could be accounted for, in part, by the occurrence of (G + C)-
rich tracts in the human sequences. Regions very high in (G + C) were
found to occur at much higher frequencies than expected in the human g
enome, and this severely limits the usefulness of this approach for pr
edicting the frequency of(G + C)-rich oligomers. Unexpectedly, more th
an 1% of the human genome consists of tracts at least 28 bp in length
with a (G + C) content greater than 85%. (C) 1996 Academic Press Limit
ed