With the huge increase of protein data, an important problem is to estimate
, within a large protein family, the number of sensible subsets for subsequ
ent in-depth structural, functional, and evolutionary analyses. To tackle t
his problem, we developed a new program, Secator, which implements the prin
ciple of an ascending hierarchical method using a distance matrix based on
a multiple alignment of protein sequences. Dissimilarity values assigned to
the nodes of a deduced phylogenetic tree are partitioned by a new stopping
rule introduced to automatically determine the significant dissimilarity v
alues. The quality of the clusters obtained by Secator is verified by a sep
arate Jackknife study. The method is demonstrated on 24 large protein famil
ies covering a wide spectrum of structural and sequence conservation and it
s usefulness and accuracy with real biological data is illustrated on two w
ell-studied protein families (the Sm proteins and the nuclear receptors).