As. Siddiqui et Gj. Barton, CONTINUOUS AND DISCONTINUOUS DOMAINS - AN ALGORITHM FOR THE AUTOMATIC-GENERATION OF RELIABLE PROTEIN DOMAIN DEFINITIONS, Protein science, 4(5), 1995, pp. 872-884
An algorithm is presented for the fast and accurate definition of prot
ein structural domains from coordinate data without prior knowledge of
the number or type of domains. The algorithm explicitly locates domai
ns that comprise one or two continuous segments of protein chain. Doma
ins that include more than two segments are also located. The algorith
m was applied to a nonredundant database of 230 protein structures and
the results compared to domain definitions obtained from the literatu
re, or by inspection of the coordinates on molecular graphics. For 70%
of the proteins, the derived domains agree with the reference definit
ions, 18% show minor differences and only 12% (28 proteins) show very
different definitions. Three screens were applied to identify the deri
ved domains least likely to agree with the subjective definition set.
These screens revealed a set of 173 proteins, 97% of which agree well
with the subjective definitions. The algorithm represents a practical
domain identification tool that can be run routinely on the entire str
uctural database. Adjustment of parameters also allows smaller compact
units to be identified in proteins.