Self-organizing tree-growing network for the classification of protein sequences

Citation
Hc. Wang et al., Self-organizing tree-growing network for the classification of protein sequences, PROTEIN SCI, 7(12), 1998, pp. 2613-2622
Citations number
27
Categorie Soggetti
Biochemistry & Biophysics
Journal title
PROTEIN SCIENCE
ISSN journal
09618368 → ACNP
Volume
7
Issue
12
Year of publication
1998
Pages
2613 - 2622
Database
ISI
SICI code
0961-8368(199812)7:12<2613:STNFTC>2.0.ZU;2-S
Abstract
The self-organizing tree algorithm (SOTA) was recently introduced to constr uct phylogenetic trees from biological sequences, based on the principles o f Kohonen's self-organizing maps and on Fritzke's growing cell structures. SOTA is designed in such a way that the generation of new nodes can be stop ped when the sequences assigned to a node are already above a certain simil arity threshold. In this way a phylogenetic tree resolved at a high taxonom ic level can be obtained. This capability is especially useful to classify sets of diversified sequences. SOTA was originally designed to analyze pre- aligned sequences. It is now adapted to be able to analyze patterns associa ted to the frequency of residues along a sequence, such as protein dipeptid e composition and other n-gram compositions. In this work we show that the algorithm applied to these data is able to not only successfully construct phylogenetic trees of protein families, such as cytochrome c, triosephophat e isomerase, and hemoglobin alpha chains, but also classify very diversifie d sequence data sets, such as a mixture of interleukins and their receptors .