Distinct stages of protein evolution as suggested by protein sequence analysis

Citation
En. Trifonov et al., Distinct stages of protein evolution as suggested by protein sequence analysis, J MOL EVOL, 53(4-5), 2001, pp. 394-401
Citations number
40
Categorie Soggetti
Biology,"Experimental Biology
Journal title
JOURNAL OF MOLECULAR EVOLUTION
ISSN journal
00222844 → ACNP
Volume
53
Issue
4-5
Year of publication
2001
Pages
394 - 401
Database
ISI
SICI code
0022-2844(200110/11)53:4-5<394:DSOPEA>2.0.ZU;2-2
Abstract
Evolution of proteins encoded in nucleotide sequences began with the advent of the triplet code. The chronological order of the appearance of amino ac ids on the evolution scene and the steps in the evolution of the triplet co de have been recently reconstructed (Trifonov, 2000b) on the basis of 40 di fferent ranking criteria and hypotheses. According to the consensus chronol ogy, the pair of complementary GGC and GCC codons for the amino acids alani ne and glycine appeared first. Other codons appeared as complementary pairs as well, which divided their respective amino acids into two alphabets, en coded by triplets with either central purines or central pyrimidines: G, D, S, E, N, R, K, Q, C, H, Y, and W (Glycine alphabet G) and A, V, P, S, L, T , I, F, and M (Alanine alphabet A). It is speculated that the earliest poly peptide chains were very short, presumably of uniform length, belonging to two alphabet types encoded in the two complementary strands of the earliest mRNA duplexes. After the fusion of the minigenes, a mosaic of the alphabet s would form. Traces of the predicted mosaic structure have been, indeed, d etected in the protein sequences of complete prokaryotic genomes in the for m of weak oscillations with the period 12 residues in the form of alteratio n of two types of 6 residue long units. The next stage of protein evolution corresponded to the closure of the chains in the loops of the size 25-30 r esidues (Berezovsky et al., 2000). Autocorrelation analysis of proteins of 23 complete archaebacterial and eubacterial genomes revealed that the prefe rred distances between valine. alanine, glycine, leucine, and isoleucine al ong the sequences are in the same range of 25-30 residues, indicating that the loops are primarily closed by hydrophobic interactions between the ends of the loops. The loop closure stage is followed by the formation of typic al folds of 100-200 amino acids, via end-to-end fusion of the genes encodin g the loop-size chains. This size was apparently dictated by the optimal ri ng closure for DNA. In both cases the closure into the ring (loop) rendered evolutionarily advantageous stability to the respective structures. Furthe r gene fusions lead to the formation of modern multidomain proteins. Recomb inational gene splicing is likely to have appeared after the DNA circulariz ation stage.