Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties

Citation
J. Kim et al., Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties, BIOINFORMAT, 16(9), 2000, pp. 767-775
Citations number
43
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
16
Issue
9
Year of publication
2000
Pages
767 - 775
Database
ISI
SICI code
1367-4803(200009)16:9<767:IONMPF>2.0.ZU;2-A
Abstract
Motivation: Identification of novel G protein-coupled receptors and other m ulti-transmembrane proteins from genomic databases using structural feature s. Results: Here we describe a new algorithm for identifying multi-transmembra ne proteins from genomic databases with a specific application to identifyi ng G protein-coupled receptors (GPCRs) that we call quasi-periodic feature classifier(QFC). The QFC algorithm uses concise statistical variables as th e feature space' to characterize the quasi-periodic physico-chemical proper ties of multi-transmembrane proteins. For the case of identifying GPCRs; th e variables are then used in a non-parametric linear discriminant function to separate GPCRs from non-GPCRs. The algorithm runs in time linearly propo rtional to the number of sequences, and performance on a test dataset shows 96% positive identification of known GPCRs. The QFC algorithm also works w ell with short random segments of proteins and it positively identified GPC Rs at a level greater than 90% even with segments as short as 100 amino aci ds. The primary advantage of the algorithm is that it does not directly use primary sequence patterns which may be subject to sampling bias. The utili ty of the new algorithm has been demonstrated by the isolation from the Dro sophila genome project database of a novel class of seven-transmembrane pro teins which were shown to be the elusive olfactory receptor genes of Drosop hila.