OVERREPRESENTATION AND UNDERREPRESENTATION OF SHORT DNA WORDS IN HERPESVIRUS GENOMES

Citation
My. Leung et al., OVERREPRESENTATION AND UNDERREPRESENTATION OF SHORT DNA WORDS IN HERPESVIRUS GENOMES, Journal of computational biology, 3(3), 1996, pp. 345-360
Citations number
47
Categorie Soggetti
Biology,"Biochemical Research Methods",Mathematics
ISSN journal
10665277
Volume
3
Issue
3
Year of publication
1996
Pages
345 - 360
Database
ISI
SICI code
1066-5277(1996)3:3<345:OAUOSD>2.0.ZU;2-3
Abstract
The relative abundance and rarity of DNA words have been recognized in previous biological studies to have implications for the regulation, repair, and evolutionary mechanisms of a genome, In this paper, we rev iew several different measures of abundance and rarity of DNA words, i ncluding z-scores, representation ratios, and cross-ratios, that have appeared in the recent literature, and examine the concordance among t hem using the human cytomegalovirus genome sequence, We then rank all words of length k = 2,..., 5 of seven herpesvirus genomes according to their abundance, as measured by one of the z-scores based upon a stat ionary Markov model of order k - 2, Using a simple metric on the ranks of 2-words of the seven herpesvirus sequences, we construct an evolut ionary tree, Several 3-words are observed to be consistently over- or underrepresented in all seven herpesviruses, Furthermore, clusters of some of the most over- and underrepresented 4- and 5-words in the geno mes are identified with functional sites such as the origins of replic ation and regulatory signals of individual viruses.