The open reading frames of human cytomegalovirus (human herpesvirus-5, HHV5
) encode some 213 unique proteins with mostly unknown functions. Using the
threading program, ProCeryon, we calculated possible matches between the am
ino acid sequences of these proteins and the Protein Data Bank library of t
hree-dimensional structures. Thirty-six proteins were fully identified in t
erms of their structure and, often, function; 65 proteins were recognized a
s members of narrow structural/functional families (e.g. DNA-binding factor
s, cytokines, enzymes, signaling particles, cell surface receptors etc.); a
nd 87 proteins were assigned to broad structural classes (e.g. all-beta, 3-
layer-alpha beta alpha, multidomain, etc.). Genes encoding proteins with si
milar folds, or containing identical structural traits (extreme sequence le
ngth, runs of unstructured (Pro and/or Gly-rich) residues, transmembrane se
gments, etc.) often formed tandem clusters throughout the genome. In the co
urse of this work, benchmarks on about 20 known folds were used to optimize
adjustable parameters of threading calculations, i.e. gap penalty weights
used in sequence/structure alignments; new scores obtained as simple combin
ations of existing scoring functions; and number of threading runs conduciv
e to meaningful results. An introduction of summed, per-residue-normalized
scores has been essential for discovery of subdomains (EGF-like, SH2, SH3)
in longer protein sequences, such as the eight "open sandwich" cytokine dom
ains, 60-70 amino acids long and having the 3 beta1 alpha fold with one or
two disulfide bridges, present in otherwise unrelated proteins. (C) 2001 Ac
ademic Press.