J. Kim et al., Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties, BIOINFORMAT, 16(9), 2000, pp. 767-775
Motivation: Identification of novel G protein-coupled receptors and other m
ulti-transmembrane proteins from genomic databases using structural feature
s.
Results: Here we describe a new algorithm for identifying multi-transmembra
ne proteins from genomic databases with a specific application to identifyi
ng G protein-coupled receptors (GPCRs) that we call quasi-periodic feature
classifier(QFC). The QFC algorithm uses concise statistical variables as th
e feature space' to characterize the quasi-periodic physico-chemical proper
ties of multi-transmembrane proteins. For the case of identifying GPCRs; th
e variables are then used in a non-parametric linear discriminant function
to separate GPCRs from non-GPCRs. The algorithm runs in time linearly propo
rtional to the number of sequences, and performance on a test dataset shows
96% positive identification of known GPCRs. The QFC algorithm also works w
ell with short random segments of proteins and it positively identified GPC
Rs at a level greater than 90% even with segments as short as 100 amino aci
ds. The primary advantage of the algorithm is that it does not directly use
primary sequence patterns which may be subject to sampling bias. The utili
ty of the new algorithm has been demonstrated by the isolation from the Dro
sophila genome project database of a novel class of seven-transmembrane pro
teins which were shown to be the elusive olfactory receptor genes of Drosop
hila.