Rd. King et al., ON THE USE OF MACHINE LEARNING TO IDENTIFY TOPOLOGICAL RULES IN THE PACKING OF BETA-STRANDS, Protein engineering, 7(11), 1994, pp. 1295-1303
The machine learning program GOLEM was applied to discover topological
rules in the packing of beta-sheets in alpha/beta-domain proteins. Ru
les (constraints) were determined for four features of beta-sheet pack
ing: (i) whether a beta-strand is at an edge; (ii) whether two consecu
tive beta-strands pack parallel or anti-parallel; (iii) whether two be
ta-strands pack adjacently; and (iv) the winding direction of two cons
ecutive beta-strands. Rules were found with high predictive accuracy a
nd coverage. The errors were generally associated with complications i
n domain folds, especially in one doubly wound domains. Investigation
of the rules revealed interesting patterns, some of which were known p
reviously, others that are novel. Novel features include (i) the relat
ionship between pairs of sequential strands is in general one of decre
asing size; (ii) more sequential pairs of strands wind in the directio
n out than in; and (iii) it takes a larger alteration in hydrophobicit
y to change a strand from winding in the direction out than in. These
patterns in the data may be the result of folding pathways in the doma
ins. The rules found are of predictive value and could be used in the
combinatorial prediction of protein structure, or as a general test of
model structures, e.g. those produced by threading. We conclude that
machine learning has a useful role in the analysis of protein structur
es.