Analysis of residue correlation in over 2700 mouse heavy chains of the V-H
domains was carried out on three hierarchical levels. At the 'position' lev
el, statistical analysis revealed 45 positions that conserve similar residu
es in almost all chains. At the 'fragment' level, the focus of investigatio
n shifted to the study of combinations of amino acids in strands and loops.
It was found that no more than 10 patterns were sufficient for describing
strands and loops in the chains. At the 'sequence' level, we determined all
possible combinations of these patterns and classified the mouse heavy cha
ins. Comparison of the sequences in the eight classes revealed residues at
the class-determining positions that were unique to each class, Because a s
trong correlation of residues was found, one only needs several residues to
classify a sequence. It follows that no all residue alignment procedure is
necessary to divide sequences into classes. An important corollary of our
approach is the possibility of predicting residues in an incomplete sequenc
e from a small sequence fragment. On the basis of our analysis of mouse hea
vy chains we hypothesize about the presently unknown mouse V-H germline rep
ertoire.