New techniques for DNA sequence classification

Citation
Jtl. Wang et al., New techniques for DNA sequence classification, J COMPUT BI, 6(2), 1999, pp. 209-218
Citations number
42
Categorie Soggetti
Biochemistry & Biophysics
Journal title
JOURNAL OF COMPUTATIONAL BIOLOGY
ISSN journal
10665277 → ACNP
Volume
6
Issue
2
Year of publication
1999
Pages
209 - 218
Database
ISI
SICI code
1066-5277(199922)6:2<209:NTFDSC>2.0.ZU;2-V
Abstract
DNA sequence classification is the activity of determining whether or not a n unlabeled sequence S belongs to an existing class C. This paper proposes two new techniques for DNA sequence classification. The first technique wor ks by comparing the unlabeled sequence S with a group of active motifs disc overed from the elements of C and by distinction with elements outside of C . The second technique generates and matches gapped fingerprints of S with elements of C. Experimental results obtained by running these algorithms on long and well conserved Alu sequences demonstrate the good performance of the presented methods compared with FASTA. When applied to less conserved a nd relatively short functional sites such as splice-junctions, a variation of the second technique combining fingerprinting with consensus sequence an alysis gives better results than the current classifiers employing text com pression and machine learning algorithms.