More than 1,000 putative new human signalling proteins revealed by EST data mining

Citation
J. Schultz et al., More than 1,000 putative new human signalling proteins revealed by EST data mining, NAT GENET, 25(2), 2000, pp. 201-204
Citations number
12
Categorie Soggetti
Molecular Biology & Genetics
Journal title
NATURE GENETICS
ISSN journal
10614036 → ACNP
Volume
25
Issue
2
Year of publication
2000
Pages
201 - 204
Database
ISI
SICI code
1061-4036(200006)25:2<201:MT1PNH>2.0.ZU;2-T
Abstract
Cloning procedures aided by homology searches of EST databases have acceler ated the pace of discovery of new genes' but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences ha ve been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulti es in detection associated with a high sequencing error rate and low sequen ce similarity between distant homologues. We have developed a new method, c oupling BLAST-based(2) searches with a domain identification protocol(3,4) that filters candidate homologues. Application of this method in a large-sc ale analysis of 100 signalling domain families has led to the identificatio n of ESTs representing more than 1,000 novel human signalling genes. The 4, 206 publicly available ESTs representing these genes are a valuable resourc e for rapid cloning of novel human signalling proteins. For example, we wer e able to identify ESTs of at least 106 new small GTPases, of which 6 are l ikely to belong to new subfamilies. In some cases, further analyses of geno mic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a g ene product sequence using only genomic and EST sequence data) of a new typ e of GTPase with two catalytic domains.