IDENTIFICATION OF MEMBERS OF GENE FAMILIES IN ARABIDOPSIS-THALIANA BYCONTIG CONSTRUCTION FROM PARTIAL CDNA SEQUENCES - 106 GENES ENCODING 50 CYTOPLASMIC RIBOSOMAL-PROTEINS
R. Cooke et al., IDENTIFICATION OF MEMBERS OF GENE FAMILIES IN ARABIDOPSIS-THALIANA BYCONTIG CONSTRUCTION FROM PARTIAL CDNA SEQUENCES - 106 GENES ENCODING 50 CYTOPLASMIC RIBOSOMAL-PROTEINS, Plant journal, 11(5), 1997, pp. 1127-1140
Partial cDNA sequencing to obtain expressed sequence tags (ESTs) has l
ed to the identification of tags to about 8000 of the estimated 20 000
genes in Arabidopsis thaliana. This figure represents four to five ti
mes the number of complete coding sequences from this organism availab
le in international databases. In contrast to mammals, many proteins a
re encoded by multigene families in A. thaliana. Using ribosomal prote
in gene families as an example, it is possible to construct relatively
long sequences from overlapping ESTs which are of sufficiently high q
uality to be able to unambiguously identify tags to individual members
of multigene families, even when the sequences are highly conserved.
A total of 106 genes encoding 50 different cytoplasmic ribosomal prote
in types have been identified, most proteins being encoded by at least
two and up to four genes. Coding sequences of members of individual g
ene families are almost always very highly conserved and derived amino
acid sequences are almost, if not completely, identical in the vast m
ajority of cases. Sequence divergence is observed in untranslated regi
ons which allows the definition of gene-specific probes. The method ca
n be used to construct high-quality tags to any protein.