One hundred-forty-five full-length aldehyde dehydrogenase-related sequences
were aligned to determine relationships within the aldehyde dehydrogenase
(ALDH) extended family. The alignment reveals only four invariant residues:
two glycines, a phenylalanine involved in NAD binding, and a glutamic acid
that coordinates the nicotinamide ribose in certain E-NAD binary complex c
rystal structures, but which may also serve as a general base for the catal
ytic reaction. The cysteine that provides the catalytic thiol and its close
st neighbor in space, an asparagine residue, are conserved in all ALDHs wit
h demonstrated dehydrogenase activity. Sixteen residues are conserved in at
least 95% of the sequences; 12 of these cluster into seven sequence motifs
conserved in almost ail ALDHs. These motifs cluster around the active site
of the enzyme. Phylogenetic analysis of these ALDHs indicates at least 13
ALDH families, most of which have previously been identified but not groupe
d separately by alignment. ALDHs cluster into two main trunks of the phylog
enetic tree. The largest, the "Class 3" trunk, contains mostly substrate-sp
ecific ALDH families, as well as the class 3 ALDH family itself The other t
runk, the "Class 1/2" trunk, contains mostly variable substrate ALDH famili
es, including the class 1 and 2 ALDH families. Divergence of the substrate-
specific ALDHs occurred earlier than the division between ALDHs with broad
substrate specificities. A site on the World Wide Web has also been devoted
to this alignment project.