ITA
ENG

IMPROVING THE EFFICIENCY OF THE GENETIC-CODE BY VARYING THE CODON LENGTH - THE PERFECT GENETIC-CODE

Authors

DOIG AJ

Citation

Aj. Doig, IMPROVING THE EFFICIENCY OF THE GENETIC-CODE BY VARYING THE CODON LENGTH - THE PERFECT GENETIC-CODE, Journal of theoretical biology, 188(3), 1997, pp. 355-360

Citations number

Categorie Soggetti

Biology Miscellaneous

Journal title

Journal of theoretical biology → ACNP

ISSN journal

00225193

Volume

188

Issue

Year of publication

1997

Pages

355 - 360

Database

ISI

SICI code

0022-5193(1997)188:3<355:ITEOTG>2.0.ZU;2-U

Abstract

The function of DNA is to specify protein sequences. The four-base ''a lphabet'' used in nucleic acids is translated to the 20 base alphabet of proteins (plus a stop signal) via the genetic code. The code is nei ther overlapping nor punctuated, but has mRNA sequences read in succes sive triplet codons until reaching a stop codon. The true genetic code uses three bases for every amino acid. The efficiency of the genetic code can be significantly increased if the requirement for a fixed cod on length is dropped so that the more common amino acids have shorter codon lengths and rare amino acids have longer codon lengths. More eff icient codes can be derived using the Shannon-Fano and Huffman coding algorithms. The compression achieved using a Huffman code cannot be im proved upon. I have used these algorithms to derive efficient codes fo r representing protein sequences using both two and four bases. The le ngth of DNA required to specify the complete set of protein sequences could be significantly shorter if transcription used a variable codon length. The restriction to a fixed codon length of three bases means t hat it takes 42% more DNA than the minimum necessary, and the genetic code is 70% efficient. One can think of many reasons why this maximall y efficient code has not evolved: there is very little redundancy so a lmost any mutation causes an amino acid change. Many mutations will be potentially lethal frame-shift mutations, if the mutation leads to a change in codon length. It would be more difficult for the machinery o f transcription to cope with a variable codon length. Nevertheless, in the strict and narrow sense of coding for protein sequences using the minimum length of DNA possible, the Huffman code derived here is perf ect. (C) 1997 Academic Press Limited.