A measure of discrepancy of multiple sequences

Citation
Ww. Fang et al., A measure of discrepancy of multiple sequences, INF SCI, 137(1-4), 2001, pp. 75-102
Citations number
28
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
INFORMATION SCIENCES
ISSN journal
00200255 → ACNP
Volume
137
Issue
1-4
Year of publication
2001
Pages
75 - 102
Database
ISI
SICI code
0020-0255(200109)137:1-4<75:AMODOM>2.0.ZU;2-5
Abstract
Multiple sequence comparison is a basic problem for molecular biology and o ther sciences. In this paper, we introduce the concept of complete informat ion set and some measurement principles for measuring discrepancy among mul tiple sequences. Based on them, we present a new measurement method satisfy ing the principles for comparing multiple sequences. We illustrate that thi s method can effectively distinguish different random sequences or DNA sequ ences of length 8000 by comparisons of 6-8 symbol (base) strings or protein sequences of length 8000 by comparisons of 3-4 symbol (amino acid) strings . It can also measure slight changes of a sequence, e.g., insertion or dele tion of a symbol (a base or an amino acid) in a sequence. It is applied in the study of molecular evolution, and the elementary result shows a hierarc hic relationship among the cytochrome C protein sequences of different spec ies, much as that in taxonomy. (C) 2001 Elsevier Science Inc. All rights re served.