RECURRING LOCAL SEQUENCE MOTIFS IN PROTEINS

Authors
Citation
Kf. Han et D. Baker, RECURRING LOCAL SEQUENCE MOTIFS IN PROTEINS, Journal of Molecular Biology, 251(1), 1995, pp. 176-187
Citations number
21
Categorie Soggetti
Biology
ISSN journal
00222836
Volume
251
Issue
1
Year of publication
1995
Pages
176 - 187
Database
ISI
SICI code
0022-2836(1995)251:1<176:RLSMIP>2.0.ZU;2-E
Abstract
We describe a completely automated approach to identifying local seque nce motifs that transcend protein family boundaries. Cluster analysis is used to identify recurring patterns of variation at single position s and in short segments of contiguous positions in multiple sequence a lignments for a non-redundant set of protein families. Parallel experi ments on simulated data sets constructed with the overall residue freq uencies of proteins but not the inter-residue correlations show that n aturally occurring protein sequences are significantly more clustered than the corresponding random sequences for window lengths ranging fro m one to 13 contiguous positions. The patterns of variation at single positions are not in general surprising: chemically similar amino acid s tend to be grouped together. More interesting patterns emerge as the window length increases. The patterns of variation for longer window lengths are in part recognizable patterns of hydrophobic and hydrophil ic residues, and in part less obvious combinations. A particularly int eresting class of patterns features highly conserved glycine residues. The patterns provide a means to abstract the information contained in multiple sequence alignments and may be useful for comparison of dist antly related sequences or sequence families and for protein structure prediction. (C) 1995 Academic Press Limited