Sj. Desouza et al., INTRON POSITIONS CORRELATE WITH MODULE BOUNDARIES IN ANCIENT PROTEINS, Proceedings of the National Academy of Sciences of the United Statesof America, 93(25), 1996, pp. 14632-14636
We analyze the three-dimensional structure of proteins by a computer p
rogram that finds regions of sequence that contain module boundaries,
defining a module as a segment of pol peptide chain bounded in space b
y a specific given distance, The program defines a set of ''linker reg
ions'' that have the property that if an intron were to be placed into
each linker region, the protein would be dissected into a set of modu
les all less than the specified diameter. We test a set of 32 proteins
, all of ancient origin, and a corresponding set of 570 intron positio
ns, to ask if there is a statistically significant excess of intron po
sitions within the linker regions. For 28-Angstrom modules, a standard
size used historically, we find such an excess, with P < 0.003. This
correlation is neither due to a compositional or sequence bias in the
linker regions nor to a surface bias in intron positions, Furthermore,
a subset of 20 introns, which can be putatively identified as old, li
es even more explicitly within the linker regions, with P < 0.0003, Th
us, there is a strong correlation between intron positions and three-d
imensional structural elements of ancient proteins as expected by the
introns-early approach, We then study a range of module diameters and
show that, as the diameter varies, significant peaks of correlation ap
pear for module diameters centered at 21.7, 27.6, and 32.9 Angstrom. T
hese preferred module diameters roughly correspond to predicted exon s
izes of 15, 22, and 30 residues, Thus, there are significant correlati
ons between introns, modules, and a quantized pattern of the lengths o
f polypeptide chains, which is the prediction of the ''Exon Theory of
Genes.''