PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information
J. Qian et al., PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information, NUCL ACID R, 29(8), 2001, pp. 1750-1764
As the number of protein folds is quite limited, a mode of analysis that wi
ll be increasingly common in the future, especially with the advent of stru
ctural genomics, is to survey and re-survey the finite parts list of folds
from an expanding number of perspectives. We have developed a new resource,
called PartsList, that lets one dynamically perform these comparative fold
surveys. It is available on the web at http://bioinfo.mbb.yale.edu/partsli
st and http:il www.partslist.org. The system is based on the existing fold
classifications and functions as a form of companion annotation for them, p
roviding 'global views' of many already completed fold surveys, The central
idea in the system is that of comparison through ranking; PartsList will r
ank the approximately 420 folds based on more than 180 attributes. These in
clude: (i) occurrence in a number of completely sequenced genomes (e.g. it
will show the most common folds in the worm versus yeast); (ii) occurrence
in the structure databank (e.g. most common folds in the PDB) (iii) both ab
solute and relative gene expression information (e,g, most changing folds i
n expression over the cell cycle); (iv) protein-protein interactions, based
on experimental data in yeast and comprehensive PDB surveys (e,g, most int
eracting fold) (v) sensitivity to inserted transposons: (vi) the number of
functions associated with the fold (e.g. most multi-functional folds); (vii
) amino acid composition (e,g, most Cys-rich fords); (viii) protein motions
(e.g, most mobile folds); and (ix) the level of similarity based on a comp
rehensive set of structural alignments (e,g, most structurally variable fol
ds). The integration of whole-genome expression and protein-protein interac
tion data with structural information is a particularly novel feature of ou
r system. We provide three ways of visualizing the rankings: a profiler emp
hasizing the progression of high and low ranks across many preselected attr
ibutes, a dynamic comparer for custom comparisons and a numerical rankings
correlator, These allow one to directly compare very different attributes o
f a fold (e,g, expression level, genome occurrence and maximum motion) in t
he uniform numerical format of ranks, This uniform framework, in turn, high
lights the way that the frequency of many of the attributes falls off with
approximate power-law behavior (i.e. according to V-b, for attribute value
V and constant exponent b), with a few folds having large values and most h
aving small values.