Er. Goldman et al., ESTIMATING PROTEIN FUNCTION FROM COMBINATORIAL SEQUENCE DATA USING DECISION ALGORITHMS AND NEURAL NETWORKS, Drug development research, 33(2), 1994, pp. 125-132
Correlations between protein sequences and phenotypes were explored us
ing databases of combinatorial cassette mutants of pigment-protein com
plexes. Heuristically formulated decision algorithms and computer impl
emented neural networks were compared to determine their accuracy in c
lassification of phenotypic categories. For the databases examined, de
cision algorithms employing very simple rules were able to properly cl
assify mutants 80-84% of the time, based only on the amino acid sequen
ce of the mutagenized region. Such decision algorithms did not require
the formulation of any rules that involved site-to-site interactions,
but rather, performed well based on the stringency of specific critic
al sites in the protein that accept only a restricted set of amino aci
ds. In some cases, neural networks scored almost 10% higher than decis
ion algorithms on the same databases (i.e., 94%). However, the success
of the primitive decision algorithms and perceptrons at sorting seque
nces into categories suggests that linear effects predominate in the c
lassification of a mutant's phenotype. Such methods should be generall
y applicable to the broad spectrum of databases that are currently bei
ng generated in combinatorial chemistry and biology experiments. (C) 1
994 Wiley-Liss, Inc.