C. Scapoli et al., IDENTIFICATION OF A SET OF FREQUENT DECANUCLEOTIDES IN PLANTS AND IN ANIMALS, Computer applications in the biosciences, 10(5), 1994, pp. 465-470
We studied the frequency distribution of 1 048 576 oligonucleotides 10
bp long in a sample of 1.961 Mbase of genes from plants, made of 635
sequences extracted from GenBank 71.0, with the aim of detecting trans
cription control signals. Among all decamers, 3255, or 0.3%, had a fre
quency 10 times higher than the mean and were subjected to further sta
tistical analysis. For each of the 3255 decamers (parents), we counted
the individual frequencies of the 30 decamers (progeny) differing fro
m the parent by one base mutation, and calculated two variance/mean ch
i-squares for the progeny, with and without the parent decamer. By stu
dying the distribution of the ratio between the two chi-squares we obs
erved that out of 3255 decamers >10 times frequent than average, 432 h
ad a chi-square ratio >1.9. In this residual set, which corresponds to
<0.04 per cent of all possible decamers, only 15 known eukaryotic tra
nscription control elements were found; on the other hand, it included
29 decanucleotides that matched with decanucleotides of a set of Dros
ophila, 24 with a set from mammals, 13 with a set from yeast and four
with a set of viruses-all sets identified with the statistical procedu
res here described. These decanucleotides are highly repetitive and se
em to be present throughout all higher organisms, whereas they are unc
ommon in mammalian viruses.