Genome annotation requires explicit identification of gene function. This t
ask frequently uses protein sequence alignments with examples having a know
n function. Genetic drift, co-evolution of subunits in protein complexes an
d a variety of other constraints interfere with the relevance of alignments
. Using a specific class of proteins, it is shown that a simple data analys
is approach can help solve some of the problems posed. The origin of ureohy
drolases has been explored by comparing sequence similarity trees, maximizi
ng amino acid alignment conservation. The trees separate agmatinases from a
rginases but suggest the presence of unknown biases responsible for unexpec
ted positions of some enzymes. Using factorial correspondence analysis, a d
istance tree between sequences was established, comparing regions with gaps
in the alignments. The gap tree gives a consistent picture of functional k
inship, perhaps reflecting some aspects of phylogeny, with a clear domain o
f enzymes encoding two types of ureohydrolases (agmatinases and arginases)
and activities related to, but different from ureohydrolases, Several annot
ated genes appeared to correspond to a wrong assignment if the trees were s
ignificant. They were cloned and their products expressed and identified bi
ochemically. This substantiated the validity of the gap tree, its organizat
ion suggests a very ancient origin of ureohydrolases. Some enzymes of eukar
yotic origin are spread throughout the arginase part of the trees: they mig
ht have been derived from the genes found in the early symbiotic bacteria t
hat became the organelles. They were transferred to the nucleus when symbio
tic genes had to escape Muller's ratchet. This work also shows that arginas
es and agmatinases share the same two manganese-ion-binding sites and exhib
it only subtle differences that can be accounted for knowing the three-dime
nsional structure of arginases. In the absence of explicit biochemical data
, extreme caution is needed when annotating genes having similarities to ur
eohydrolases.