S. Scherer et al., ATYPICAL REGIONS IN LARGE GENOMIC DNA-SEQUENCES, Proceedings of the National Academy of Sciences of the United Statesof America, 91(15), 1994, pp. 7134-7138
Large genomic DNA sequences contain regions with distinctive patterns
of sequence organization. We describe a method using logarithms of pro
babilities based on seventh-order Markov chains to rapidly identify ge
nomic sequences that do not resemble models of genome organization bui
lt from compilations of octanucleotide usage. Data bases have been con
structed from Escherichia coli and Saccharomyces cerevisiae DNA sequen
ces of >1000 nt and human sequences of >10,000 nt. Atypical genes and
clusters of genes have been located in bacteriophage, yeast, and prima
te DNA sequences. We consider criteria for statistical significance of
the results, offer possible explanations for the observed variation i
n genome organization, and give additional applications of these metho
ds in DNA sequence analysis.