Ev. Kriventseva et Ms. Gelfand, Statistical analysis of the exon-intron structure of higher and lower eukaryote genes, J BIO STRUC, 17(2), 1999, pp. 281-288
Statistics of the exon-intron structure and splicing sites of several diver
se eukaryotes was studied. The yeast exon-intron structures have a number o
f unique features. A yeast gene usually have at most one intron. The branch
site is strongly conserved, whereas the polypirimidine tract is short. Lon
g yeast introns tend to have stronger acceptor sites. In other species the
branch site is less conserved and often cannot be detemined. in non-yeast s
amples there is an almost universal correlation between lengths of neighbor
ing exons (all samples excluding protists) and correlation between lengths
of neighboring introns (human, drosophila, protists). On the average first
introns are longer, and anomalously long introns are usually first introns
in a gene. There is a universal preference for exons and exon pairs with th
e (total) length divisible by 3. Introns positioned between codons are pref
erred, whereas those positioned between the first and second positions in c
odon are avoided. The choice of A or G at the third position of intron (the
donor splice sites generally prefer purines at this position) is correlate
d with the overall GC-composition of the gene. In all samples dinucleotide
AG is avoided in the region preceding the acceptor site.