We have created an algorithm which instantiates the centripetal definition
of modules, compact regions of protein structure, as introduced by Go and N
osaka (M. Go and M. Nosaka, 1987. Protein architecture and the origin of in
trons. Cold Spring Harbor Symp. Quant. Bio. 52, 915-924). That definition s
eeks the minima of a function that sums the squares of C-alpha-carbon dista
nces over a window around each amino acid residue in a three-dimensional pr
otein structure and identifies such minima with module boundaries. We analy
ze a set of 44 ancient conserved proteins, with known three-dimensional str
uctures, which have intronless homologues in bacteria and intron-containing
homologues in the eukaryotes, with a corresponding set of 988 intron posit
ions. We show that the phase zero intron positions are significantly correl
ated with the module boundaries (p=0.0002), while the intron positions that
lie within codons, in phase one and phase two, are not correlated with the
se 'centripetal' module boundaries.
Furthermore, we analyze the phylogenetic distribution of intron positions a
nd identify a subset of putatively 'ancient' intron positions: phase zero p
ositions in one phylogenetic kingdom which have an associated intron either
in an identical position or within three codons in another phylogenetic ki
ngdom (a notion of intron sliding). This subset of 120 'ancient' introns li
es closer to the module boundaries than does the full set of phase zero int
rons with high significance, a p-value of 0.008. We conclude that the behav
ior of this set of introns supports the prediction of a mixed theory: that
some introns are very old and were used for exon shuffling in the progenote
, while many introns have been lost and added since. (C) 1999 Elsevier Scie
nce B.V. All rights reserved.