We have investigated amino acid features that determine secondary structure
: (1) the solvent accessibility of each side chain, and (2) the interaction
of each side chain with others one to four residues apart. Solvent accessi
bility is a simple model that distinguishes residue environment. The pairwi
se interactions represent a simple model of local side chain to side chain
interactions. To test the importance of these features we developed an algo
rithm to separate alpha -helices, beta -strands, and "other" structure. Sin
gle residue and pairwise probabilities were determined for 25,141 samples f
rom proteins with < 30% homology. Combining the features of solvent accessi
bility with pairwise probabilities allows us to distinguish the three struc
tures after cross validation at the 82.0% level. We gain 1.4% to 2.0% accur
acy by optimizing the propensities, demonstrating that probabilities do not
necessarily reflect propensities. Optimization of residue exposures, weigh
ts of all probabilities, and propensities increased accuracy to 84.0%.