R. Jansen et M. Gerstein, Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins, NUCL ACID R, 28(6), 2000, pp. 1481-1488
We analyzed 10 genome expression data sets by large-scale cross-referencing
against broad structural and functional categories. The data sets, generat
ed by different techniques (e.g. SAGE and gene chips), provide various repr
esentations of the yeast transcriptome (the set of all yeast genes, weighte
d by transcript abundance). Our analysis enabled us to determine features m
ore prevalent in the transcriptome than the genome: i.e. those that are com
mon to highly expressed proteins. Starting with simplest categories, we fin
d that, relative to the genome, the transcriptome is enriched in Ala and Gl
y and depleted in Asn and very long proteins. We find, furthermore, that pr
otein length and maximum expression level have a roughly inverse relationsh
ip. To relate expression level and protein structure, we assigned transmemb
rane helices and known folds (using PSI-blast) to each protein in the genom
e; this allowed us to determine that the transcriptome is enriched in mixed
alpha-beta structures and depleted in membrane proteins relative to the ge
nome. In particular, some enzymatic folds, such as the TIM barrel and the G
3P dehydrogenase fold, are much more prevalent in the transcriptome than th
e genome, whereas others, such as the protein-kinase and leucine-zipper fol
ds, are depleted. The TIM barrel, in fact, is overwhelmingly the 'top fold'
in the transcriptome, while it only ranks fifth in the genome. The most hi
ghly enriched functional categories in the transcriptome (based on the MIPS
system) are energy production and protein synthesis, while categories such
as transcription, transport and signaling ave depleted. Furthermore, for a
given functional category, transcriptome enrichment varies quite substanti
ally between the different expression data sets, with a variation an order
of magnitude larger than for the other categories cross-referenced (e.g. am
ino acids). One can readily see how the enrichment and depletion of the var
ious functional categories relates directly to that of particular folds. Fu
rther information can be found at http://bioinfo.mbb.yale.edu/genome/expres
sion.