My. Leung et al., OVERREPRESENTATION AND UNDERREPRESENTATION OF SHORT DNA WORDS IN HERPESVIRUS GENOMES, Journal of computational biology, 3(3), 1996, pp. 345-360
Citations number
47
Categorie Soggetti
Biology,"Biochemical Research Methods",Mathematics
The relative abundance and rarity of DNA words have been recognized in
previous biological studies to have implications for the regulation,
repair, and evolutionary mechanisms of a genome, In this paper, we rev
iew several different measures of abundance and rarity of DNA words, i
ncluding z-scores, representation ratios, and cross-ratios, that have
appeared in the recent literature, and examine the concordance among t
hem using the human cytomegalovirus genome sequence, We then rank all
words of length k = 2,..., 5 of seven herpesvirus genomes according to
their abundance, as measured by one of the z-scores based upon a stat
ionary Markov model of order k - 2, Using a simple metric on the ranks
of 2-words of the seven herpesvirus sequences, we construct an evolut
ionary tree, Several 3-words are observed to be consistently over- or
underrepresented in all seven herpesviruses, Furthermore, clusters of
some of the most over- and underrepresented 4- and 5-words in the geno
mes are identified with functional sites such as the origins of replic
ation and regulatory signals of individual viruses.