ITA
ENG

SELF-ORGANIZING HIERARCHICAL NETWORKS FOR PATTERN-RECOGNITION IN PROTEIN-SEQUENCE

Authors

HANKE J BECKMANN G BORK P REICH JG

Citation

J. Hanke et al., SELF-ORGANIZING HIERARCHICAL NETWORKS FOR PATTERN-RECOGNITION IN PROTEIN-SEQUENCE, Protein science, 5(1), 1996, pp. 72-82

Citations number

Categorie Soggetti

Biology

Journal title

Protein science → ACNP

ISSN journal

09618368

Volume

Issue

Year of publication

1996

Pages

72 - 82

Database

ISI

SICI code

0961-8368(1996)5:1<72:SHNFPI>2.0.ZU;2-K

Abstract

We present a method based on hierarchical self-organizing maps (SOMs) for recognizing patterns in protein sequences. The method is fully aut omatic, does not require prealigned sequences, is insensitive to redun dancy in the training set, and works surprisingly well even with small learning sets. Because it uses unsupervised neural networks, it is ab le to extract patterns that are not present in all of the unaligned se quences of the learning set. The identification of these patterns in s equence databases is sensitive and efficient. The procedure comprises three main training stages. In the first stage, one SOM is trained to extract common features from the set of unaligned learning sequences. A feature is a number of ungapped sequence segments (usually 4-16 resi dues long) that are similar to segments in most of the sequences of th e learning set according to an initial similarity matrix. in the secon d training stage, the recognition of each individual feature is refine d by selecting an optimal weighting matrix out of a variety of existin g amino acid similarity matrices. In a third stage of the SOM procedur e, the position of the features in the individual sequences is learned . This allows for variants with feature repeats and feature shuffling. The procedure has been successfully applied to a number of notoriousl y difficult cases with distinct recognition problems: helix-turn-helix motifs in DNA-binding proteins, the CUB domain of developmentally reg ulated proteins, and the superfamily of ribokinases. A comparison with the established database search procedure PROFILE (and with several o thers) led to the conclusion that the new automatic method performs sa tisfactorily.