M. Remm et E. Sonnhammer, Classification of transmembrane protein families in the Caenorhabditis elegans genome and identification of human orthologs, GENOME RES, 10(11), 2000, pp. 1679-1689
The complete genome sequence of the nematode Caenorhabditis elegans provide
s an excellent basis for studying the distribution and evolution of protein
families in higher eukaryotes. Three Fundamental questions are as follows:
How many paralog clusters exist in one species, how many of these are shar
ed with other species, and how many proteins can be assigned a functional c
ounterpart in other species? We have addressed these questions in a detaile
d study of predicted membrane proteins in C. elegans and their mammalian ho
mologs. All worm proteins predicted to contain at least two transmembrane s
egments were clustered on the basis of sequence similarity. This resulted i
n 189 groups with two or more sequences, containing, in total, 2647 worm pr
oteins. Hidden Markov models (HMMs) were created for each family, and were
used to retrieve mammalian homologs from the SWISSPROT, TREMBL, and VTS dat
abases. About one-half of these clusters had mammalian homologs. Putative w
orm-mammalian orthologs were extracted by use of nine different phylogeneti
c methods and BLAST. Eight clusters initially thought to be worm-specific w
ere assigned mammalian homologs after searching EST and genomic sequences.
A compilation of 174 orthology assignments made with high confidence is pre
sented. [Tables describing transmembrane protein families and orthology ass
ignments are available from ftp.cgr.ki.se/pub/data/worm.]