LINGUISTIC COMPLEXITY OF PROTEIN SEQUENCES AS COMPARED TO TEXTS OF HUMAN LANGUAGES

Citation
O. Popov et al., LINGUISTIC COMPLEXITY OF PROTEIN SEQUENCES AS COMPARED TO TEXTS OF HUMAN LANGUAGES, Biosystems, 38(1), 1996, pp. 65-74
Citations number
9
Categorie Soggetti
Biology
Journal title
ISSN journal
03032647
Volume
38
Issue
1
Year of publication
1996
Pages
65 - 74
Database
ISI
SICI code
0303-2647(1996)38:1<65:LCOPSA>2.0.ZU;2-X
Abstract
A notion and a measure of linguistic complexity introduced earlier(Tri fonov, 1990) were originally used for analysis of nucleotide sequences . This measure was shown to reflect multiplicity of codes (messages) o f different natures superimposed in the sequences. Unlike human langua ge texts, genetic texts are 'read' by cellular mechanisms in several d ifferent ways, each time using a different selection of the characters of the same text while skipping others (Trifonov, 1989). Human texts are read in one way only, sequentially and involving all characters (o ne code). The conceptual significance and essence of the idea on the m ultiplicity of overlapping codes in genetic sequences, as opposed to h uman languages, is discussed. The linguistic complexity technique allo ws a calculation to be made of the structural complexity of any linear sequence of characters irrespective of whether the text is cognized o r presently undeciphered. The texts (sequences) are compared exclusive ly from the point of view of their structural complexity with no refer ence to the meaning of the texts which is beyond the scope of this art icle. Results of such a comparison of protein sequences with various t exts, written in English, Italian and Welsh are presented. The human t exts are found to be structurally simpler than genetic (protein) texts , reflecting, apparently, a difference in the reading modes: single co de versus many codes.