Classification of transmembrane protein families in the Caenorhabditis elegans genome and identification of human orthologs

Citation
M. Remm et E. Sonnhammer, Classification of transmembrane protein families in the Caenorhabditis elegans genome and identification of human orthologs, GENOME RES, 10(11), 2000, pp. 1679-1689
Citations number
38
Categorie Soggetti
Molecular Biology & Genetics
Journal title
GENOME RESEARCH
ISSN journal
10889051 → ACNP
Volume
10
Issue
11
Year of publication
2000
Pages
1679 - 1689
Database
ISI
SICI code
1088-9051(200011)10:11<1679:COTPFI>2.0.ZU;2-B
Abstract
The complete genome sequence of the nematode Caenorhabditis elegans provide s an excellent basis for studying the distribution and evolution of protein families in higher eukaryotes. Three Fundamental questions are as follows: How many paralog clusters exist in one species, how many of these are shar ed with other species, and how many proteins can be assigned a functional c ounterpart in other species? We have addressed these questions in a detaile d study of predicted membrane proteins in C. elegans and their mammalian ho mologs. All worm proteins predicted to contain at least two transmembrane s egments were clustered on the basis of sequence similarity. This resulted i n 189 groups with two or more sequences, containing, in total, 2647 worm pr oteins. Hidden Markov models (HMMs) were created for each family, and were used to retrieve mammalian homologs from the SWISSPROT, TREMBL, and VTS dat abases. About one-half of these clusters had mammalian homologs. Putative w orm-mammalian orthologs were extracted by use of nine different phylogeneti c methods and BLAST. Eight clusters initially thought to be worm-specific w ere assigned mammalian homologs after searching EST and genomic sequences. A compilation of 174 orthology assignments made with high confidence is pre sented. [Tables describing transmembrane protein families and orthology ass ignments are available from ftp.cgr.ki.se/pub/data/worm.]