Cloning procedures aided by homology searches of EST databases have acceler
ated the pace of discovery of new genes' but EST database searching remains
an involved and onerous task. More than 1.6 million human EST sequences ha
ve been deposited in public databases, making it difficult to identify ESTs
that represent new genes. Compounding the problems of scale are difficulti
es in detection associated with a high sequencing error rate and low sequen
ce similarity between distant homologues. We have developed a new method, c
oupling BLAST-based(2) searches with a domain identification protocol(3,4)
that filters candidate homologues. Application of this method in a large-sc
ale analysis of 100 signalling domain families has led to the identificatio
n of ESTs representing more than 1,000 novel human signalling genes. The 4,
206 publicly available ESTs representing these genes are a valuable resourc
e for rapid cloning of novel human signalling proteins. For example, we wer
e able to identify ESTs of at least 106 new small GTPases, of which 6 are l
ikely to belong to new subfamilies. In some cases, further analyses of geno
mic DNA led to the discovery of previously unidentified full-length protein
sequences. This is exemplified by the in silico cloning (prediction of a g
ene product sequence using only genomic and EST sequence data) of a new typ
e of GTPase with two catalytic domains.