ITA
ENG

Self-organizing tree-growing network for the classification of protein sequences

Authors

Wang, HC Dopazo, J De la Fraga, LG Zhu, YP Carazo, JM

Citation

Hc. Wang et al., Self-organizing tree-growing network for the classification of protein sequences, PROTEIN SCI, 7(12), 1998, pp. 2613-2622

Citations number

Categorie Soggetti

Biochemistry & Biophysics

Journal title

PROTEIN SCIENCE

ISSN journal

09618368 → ACNP

Volume

Issue

Year of publication

1998

Pages

2613 - 2622

Database

ISI

SICI code

0961-8368(199812)7:12<2613:STNFTC>2.0.ZU;2-S

Abstract

The self-organizing tree algorithm (SOTA) was recently introduced to constr uct phylogenetic trees from biological sequences, based on the principles o f Kohonen's self-organizing maps and on Fritzke's growing cell structures. SOTA is designed in such a way that the generation of new nodes can be stop ped when the sequences assigned to a node are already above a certain simil arity threshold. In this way a phylogenetic tree resolved at a high taxonom ic level can be obtained. This capability is especially useful to classify sets of diversified sequences. SOTA was originally designed to analyze pre- aligned sequences. It is now adapted to be able to analyze patterns associa ted to the frequency of residues along a sequence, such as protein dipeptid e composition and other n-gram compositions. In this work we show that the algorithm applied to these data is able to not only successfully construct phylogenetic trees of protein families, such as cytochrome c, triosephophat e isomerase, and hemoglobin alpha chains, but also classify very diversifie d sequence data sets, such as a mixture of interleukins and their receptors .