Biological macromolecules such as DNA, RNA, and proteins can be regarded as
finite sequences of symbols (or words) over a finite alphabet. In this pap
er, we refer to DNA (RNA) sequences which are words on a four-letter alphab
et. A comparison is made between some "genes", or fragments of them, with r
andom sequences or random reshuffled sequences on the same alphabet and hav
ing the same length. Some combinatorial techniques of analysis of finite wo
rds are developed. A crucial role in the comparison is played by the so-cal
led special factors of a given word. In all the analysed DNA (RNA) fragment
s the distribution on the length of the number of right (left) special fact
ors differs, in a very typical way, from the corresponding distribution in
a string on the same alphabet and having the same length generated by a ran
dom source or obtained by making a random alteration (= shuffling) of the o
riginal string. This kind of change is irrespective of the length in the ra
nge that we have considered < 2650 bp and of the phylogenetic origin of the
fragment. (C) 2000 Academic Press.