THE USE OF LOGISTIC-MODELS FOR THE ANALYSIS OF CODON FREQUENCIES OF DNA-SEQUENCES IN TERMS OF EXPLANATORY VARIABLES

Citation
Kk. Amfoh et al., THE USE OF LOGISTIC-MODELS FOR THE ANALYSIS OF CODON FREQUENCIES OF DNA-SEQUENCES IN TERMS OF EXPLANATORY VARIABLES, Biometrics, 50(4), 1994, pp. 1054-1063
Citations number
20
Categorie Soggetti
Statistic & Probability","Statistic & Probability
Journal title
ISSN journal
0006341X
Volume
50
Issue
4
Year of publication
1994
Pages
1054 - 1063
Database
ISI
SICI code
0006-341X(1994)50:4<1054:TUOLFT>2.0.ZU;2-X
Abstract
The development of the regressive logistic model applicable to the ana lysis of codon frequencies of DNA sequences in terms of explanatory va riables is presented. A codon is a triplet of nucleotides that code fo r an amino acid, and may be considered as a trivariate response (B-1, B-2, B-3,), where B-i (i = 1, 2, 3) is a categorical random variable w ith values A, C, G, T. The linear order of bases in the DNA and possib le statistical dependence of the bases in a given codon make the regre ssive logistic model a suitable tool for the analysis of codon frequen cies. A problem of structural zeros arises from the fact that the stop ping codons (terminators) do not code for amino acids; this is solved by normalizing the likelihood function. Codon frequencies may also dep end on the function of the gene and they are known to differ between g enes of the same genome. Differences also occur between synonymous cod ons for the same amino acid. Thus, the use of covariates that differ b etween synonymous codons as well as covariates that are constant withi n codons of the same amino acid may be useful in explaining the freque ncies. As an illustration, the method is applied to the human mitochon drial genome using the following as explanatory variables: (1) TSCORE, a measure of the number of single base mutations required for a given codon to become a terminator; (2) AARISK, an indicator of a codon's a bility of changing by a single base substitution to triplets coding fo r amino acids with very different characteristics; (3) AVDIST, a measu re of the typicality of the amino acid coded for by the triplets. The results indicate that models that incorporate dependency structure and covariates are to be preferred to either the models comprising covari ates alone or dependency structure alone.