A GRAPH-THEORETIC APPROACH TO THE IDENTIFICATION OF 3-DIMENSIONAL PATTERNS OF AMINO-ACID SIDE-CHAINS IN PROTEIN STRUCTURES

Citation
Pj. Artymiuk et al., A GRAPH-THEORETIC APPROACH TO THE IDENTIFICATION OF 3-DIMENSIONAL PATTERNS OF AMINO-ACID SIDE-CHAINS IN PROTEIN STRUCTURES, Journal of Molecular Biology, 243(2), 1994, pp. 327-344
Citations number
80
Categorie Soggetti
Biology
ISSN journal
00222836
Volume
243
Issue
2
Year of publication
1994
Pages
327 - 344
Database
ISI
SICI code
0022-2836(1994)243:2<327:AGATTI>2.0.ZU;2-T
Abstract
This paper discusses the use of graph-theoretic methods for the repres entation and searching of three-dimensional patterns of side-chains in protein structures. The position of a side-chain is represented by ps eudo-atoms, and the relative positions of pairs of side-chains by the distances between them. This description of the geometry can be repres ented by a labelled graph in which the nodes and the edges of the grap h represent the pseudo-atoms and the sets of inter-pseudo-atomic dista nces, respectively. Given such a representation, a protein can be sear ched for the presence of a user-defined query pattern of side-chains b y means of a subgraph-isomorphism algorithm which is implemented in th e program ASSAM. Experiments with one such algorithm, that due to Ullm ann, show that it provides both an effective and a highly efficient wa y of searching for patterns of side-chains. The method is illustrated by searches for the serine protease catalytic triad, for residues invo lved in the catalytic activity of staphyloccocal nuclease, and for the zinc-binding side-chains of thermolysin. The catalytic triad pattern search revealed the existence of a second Asp-His-Ser triad-like arran gement of residues in trypsinogen and chymotrypsinogen, in addition to the catalytic residues. In addition the program can be used to search for hypothetical patterns, as is shown for a pattern of three tryptop han side-chains. These searches demonstrate that the search algorithm can successfully retrieve the great majority of the expected proteins, as well as other, previously unreported proteins that contain the. pa ttern of interest.