Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins

Citation
R. Jansen et M. Gerstein, Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins, NUCL ACID R, 28(6), 2000, pp. 1481-1488
Citations number
45
Categorie Soggetti
Biochemistry & Biophysics
Journal title
NUCLEIC ACIDS RESEARCH
ISSN journal
03051048 → ACNP
Volume
28
Issue
6
Year of publication
2000
Pages
1481 - 1488
Database
ISI
SICI code
0305-1048(20000315)28:6<1481:AOTYTW>2.0.ZU;2-2
Abstract
We analyzed 10 genome expression data sets by large-scale cross-referencing against broad structural and functional categories. The data sets, generat ed by different techniques (e.g. SAGE and gene chips), provide various repr esentations of the yeast transcriptome (the set of all yeast genes, weighte d by transcript abundance). Our analysis enabled us to determine features m ore prevalent in the transcriptome than the genome: i.e. those that are com mon to highly expressed proteins. Starting with simplest categories, we fin d that, relative to the genome, the transcriptome is enriched in Ala and Gl y and depleted in Asn and very long proteins. We find, furthermore, that pr otein length and maximum expression level have a roughly inverse relationsh ip. To relate expression level and protein structure, we assigned transmemb rane helices and known folds (using PSI-blast) to each protein in the genom e; this allowed us to determine that the transcriptome is enriched in mixed alpha-beta structures and depleted in membrane proteins relative to the ge nome. In particular, some enzymatic folds, such as the TIM barrel and the G 3P dehydrogenase fold, are much more prevalent in the transcriptome than th e genome, whereas others, such as the protein-kinase and leucine-zipper fol ds, are depleted. The TIM barrel, in fact, is overwhelmingly the 'top fold' in the transcriptome, while it only ranks fifth in the genome. The most hi ghly enriched functional categories in the transcriptome (based on the MIPS system) are energy production and protein synthesis, while categories such as transcription, transport and signaling ave depleted. Furthermore, for a given functional category, transcriptome enrichment varies quite substanti ally between the different expression data sets, with a variation an order of magnitude larger than for the other categories cross-referenced (e.g. am ino acids). One can readily see how the enrichment and depletion of the var ious functional categories relates directly to that of particular folds. Fu rther information can be found at http://bioinfo.mbb.yale.edu/genome/expres sion.