Evolution of proteins encoded in nucleotide sequences began with the advent
of the triplet code. The chronological order of the appearance of amino ac
ids on the evolution scene and the steps in the evolution of the triplet co
de have been recently reconstructed (Trifonov, 2000b) on the basis of 40 di
fferent ranking criteria and hypotheses. According to the consensus chronol
ogy, the pair of complementary GGC and GCC codons for the amino acids alani
ne and glycine appeared first. Other codons appeared as complementary pairs
as well, which divided their respective amino acids into two alphabets, en
coded by triplets with either central purines or central pyrimidines: G, D,
S, E, N, R, K, Q, C, H, Y, and W (Glycine alphabet G) and A, V, P, S, L, T
, I, F, and M (Alanine alphabet A). It is speculated that the earliest poly
peptide chains were very short, presumably of uniform length, belonging to
two alphabet types encoded in the two complementary strands of the earliest
mRNA duplexes. After the fusion of the minigenes, a mosaic of the alphabet
s would form. Traces of the predicted mosaic structure have been, indeed, d
etected in the protein sequences of complete prokaryotic genomes in the for
m of weak oscillations with the period 12 residues in the form of alteratio
n of two types of 6 residue long units. The next stage of protein evolution
corresponded to the closure of the chains in the loops of the size 25-30 r
esidues (Berezovsky et al., 2000). Autocorrelation analysis of proteins of
23 complete archaebacterial and eubacterial genomes revealed that the prefe
rred distances between valine. alanine, glycine, leucine, and isoleucine al
ong the sequences are in the same range of 25-30 residues, indicating that
the loops are primarily closed by hydrophobic interactions between the ends
of the loops. The loop closure stage is followed by the formation of typic
al folds of 100-200 amino acids, via end-to-end fusion of the genes encodin
g the loop-size chains. This size was apparently dictated by the optimal ri
ng closure for DNA. In both cases the closure into the ring (loop) rendered
evolutionarily advantageous stability to the respective structures. Furthe
r gene fusions lead to the formation of modern multidomain proteins. Recomb
inational gene splicing is likely to have appeared after the DNA circulariz
ation stage.