Neural networks were used to generalize common themes found in transme
mbrane-spanning protein helices. Various-sized databases were used con
taining nonoverlapping sequences, each 25 amino acids long. Training c
onsisted of sorting these sequences into 1 of 2 groups: transmembrane
helical peptides or nontransmembrane peptides. Learning was measured u
sing a test set 10% the size of the training set. As training set size
increased from 214 sequences to 1,751 sequences, learning increased i
n a nonlinear manner from 75% to a high of 98%, then declined to a low
of 87%. The final training database consisted of roughly equal number
s of transmembrane (928) and nontransmembrane (1,018) sequences. All t
ransmembrane sequences were entered into the database with respect to
their lipid membrane orientation: from inside the membrane to outside.
Generalized transmembrane helix and nontransmembrane peptides were co
nstructed from the maximally weighted connecting strengths of fully tr
ained networks. Four generalized transmembrane helices were found to c
ontain 9 consensus residues: a K-R-F triplet was found at the inside l
ipid interface, 2 isoleucine and 2 other phenylalanine residues were p
resent in the helical body, and 2 tryptophan residues were found near
the outside lipid interface. As a test of the training method, bacteri
orhodopsin was examined to determine the position of its 7 transmembra
ne helices.