IMPROVING THE EFFICIENCY OF THE GENETIC-CODE BY VARYING THE CODON LENGTH - THE PERFECT GENETIC-CODE

Authors
Citation
Aj. Doig, IMPROVING THE EFFICIENCY OF THE GENETIC-CODE BY VARYING THE CODON LENGTH - THE PERFECT GENETIC-CODE, Journal of theoretical biology, 188(3), 1997, pp. 355-360
Citations number
17
Categorie Soggetti
Biology Miscellaneous
ISSN journal
00225193
Volume
188
Issue
3
Year of publication
1997
Pages
355 - 360
Database
ISI
SICI code
0022-5193(1997)188:3<355:ITEOTG>2.0.ZU;2-U
Abstract
The function of DNA is to specify protein sequences. The four-base ''a lphabet'' used in nucleic acids is translated to the 20 base alphabet of proteins (plus a stop signal) via the genetic code. The code is nei ther overlapping nor punctuated, but has mRNA sequences read in succes sive triplet codons until reaching a stop codon. The true genetic code uses three bases for every amino acid. The efficiency of the genetic code can be significantly increased if the requirement for a fixed cod on length is dropped so that the more common amino acids have shorter codon lengths and rare amino acids have longer codon lengths. More eff icient codes can be derived using the Shannon-Fano and Huffman coding algorithms. The compression achieved using a Huffman code cannot be im proved upon. I have used these algorithms to derive efficient codes fo r representing protein sequences using both two and four bases. The le ngth of DNA required to specify the complete set of protein sequences could be significantly shorter if transcription used a variable codon length. The restriction to a fixed codon length of three bases means t hat it takes 42% more DNA than the minimum necessary, and the genetic code is 70% efficient. One can think of many reasons why this maximall y efficient code has not evolved: there is very little redundancy so a lmost any mutation causes an amino acid change. Many mutations will be potentially lethal frame-shift mutations, if the mutation leads to a change in codon length. It would be more difficult for the machinery o f transcription to cope with a variable codon length. Nevertheless, in the strict and narrow sense of coding for protein sequences using the minimum length of DNA possible, the Huffman code derived here is perf ect. (C) 1997 Academic Press Limited.