Evolutionary classification leads to an economical description of the prote
in sequence universe because attributes of function and structure are inher
ited in protein families. Efficient strategies of functional and structural
genomics therefore target one representative from each family. Enumerating
all families and establishing family membership consistently based on sequ
ence similarities are nontrivial computational problems. Emerging concepts
and caveats of global sequence clustering are reviewed. Explicit multiple a
lignments coupled with neighbourhood analysis lead to domain segmentation,
and hierarchical unification helps to resolve conflicts and validate cluste
rs. Eventually, every part of every sequence will be assigned to a domain f
amily which is uniquely associated with a fold and a molecular function. (C
) 2000 Elsevier Science Ltd. All rights reserved.