PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information

Citation
J. Qian et al., PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information, NUCL ACID R, 29(8), 2001, pp. 1750-1764
Citations number
73
Categorie Soggetti
Biochemistry & Biophysics
Journal title
NUCLEIC ACIDS RESEARCH
ISSN journal
03051048 → ACNP
Volume
29
Issue
8
Year of publication
2001
Pages
1750 - 1764
Database
ISI
SICI code
0305-1048(20010415)29:8<1750:PAWSFD>2.0.ZU;2-0
Abstract
As the number of protein folds is quite limited, a mode of analysis that wi ll be increasingly common in the future, especially with the advent of stru ctural genomics, is to survey and re-survey the finite parts list of folds from an expanding number of perspectives. We have developed a new resource, called PartsList, that lets one dynamically perform these comparative fold surveys. It is available on the web at http://bioinfo.mbb.yale.edu/partsli st and http:il www.partslist.org. The system is based on the existing fold classifications and functions as a form of companion annotation for them, p roviding 'global views' of many already completed fold surveys, The central idea in the system is that of comparison through ranking; PartsList will r ank the approximately 420 folds based on more than 180 attributes. These in clude: (i) occurrence in a number of completely sequenced genomes (e.g. it will show the most common folds in the worm versus yeast); (ii) occurrence in the structure databank (e.g. most common folds in the PDB) (iii) both ab solute and relative gene expression information (e,g, most changing folds i n expression over the cell cycle); (iv) protein-protein interactions, based on experimental data in yeast and comprehensive PDB surveys (e,g, most int eracting fold) (v) sensitivity to inserted transposons: (vi) the number of functions associated with the fold (e.g. most multi-functional folds); (vii ) amino acid composition (e,g, most Cys-rich fords); (viii) protein motions (e.g, most mobile folds); and (ix) the level of similarity based on a comp rehensive set of structural alignments (e,g, most structurally variable fol ds). The integration of whole-genome expression and protein-protein interac tion data with structural information is a particularly novel feature of ou r system. We provide three ways of visualizing the rankings: a profiler emp hasizing the progression of high and low ranks across many preselected attr ibutes, a dynamic comparer for custom comparisons and a numerical rankings correlator, These allow one to directly compare very different attributes o f a fold (e,g, expression level, genome occurrence and maximum motion) in t he uniform numerical format of ranks, This uniform framework, in turn, high lights the way that the frequency of many of the attributes falls off with approximate power-law behavior (i.e. according to V-b, for attribute value V and constant exponent b), with a few folds having large values and most h aving small values.