We have developed an automatic protein fingerprinting method for the evalua
tion of protein structural similarities based on secondary structure elemen
t compositions, spatial arrangements, lengths, and topologies, This method
can rapidly identify proteins sharing structural homologies as we demonstra
te with five test cases: the globins, the mammalian trypsinlike serine prot
eases, the immunoglobulins, the cupredoxins, and the actinlike ATPase domai
n-containing proteins. Principal components analysis of the similarity dist
ance matrix calculated from an all-by-all comparison of 1,031 unique chains
in the Protein Data Bank has produced a distribution of structures within
a high-dimensional structural space. Fifty percent of the variance observed
for this distribution is bounded by six axes, two of which encode structur
al variability within two large families, the immunoglobulins and the tryps
inlike serine proteases, Many aspects of the spatial distribution remain st
able upon reduction of the database to 140 proteins with minimal family ove
rlap. The axes correlated with specific structural families are no longer o
bserved. A clear hierarchy of organization is seen in the arrangement of pr
otein structures in the universe. At the highest level, protein structures
populate regions corresponding to the all-alpha, all-beta, and alpha/beta s
uperfamilies, Large protein families are arranged along family-specific axe
s, forming local densely populated regions within the space. The lowest lev
el of organization is intrafamilial; homologous structures are ordered by v
ariations in peripheral secondary structure elements or by conformational s
hifts in the tertiary structure. Proteins 1999; 34:317-332, (C) 1999Wiley-L
iss, Inc.