S. Tiwari et al., PREDICTION OF PROBABLE GENES BY FOURIER-ANALYSIS OF GENOMIC SEQUENCES, Computer applications in the biosciences, 13(3), 1997, pp. 263-270
Motivation: The major signal in coding regions of genomic sequences is
a three-base periodicity. Our aim is to use Fourier techniques to ana
lyse this periodicity, and thereby to develop a tool to recognize codi
ng regions in genomic DNA. Result: The three-base periodicity in the n
ucleotide arrangement is evidenced as a sharp peak at frequency f = 1/
3 in the Fourier (or power) spectrum. From extensive spectral analysis
of DNA sequences of total length over 5.5 million base pairs from a w
ide variety or organisms (including the human genome), and by separate
ly examining coding and non-coding sequences, we find that the relativ
e height of the peak at f = 1/3 in the Fourier spectrum is a good disc
riminator of coding potential. This feature is utilized by us to detec
t probable coding regions in DNA sequences, by examining the local sig
nal-to-noise ratio of the peak within a sliding window. While the over
all accuracy is comparable to that of other techniques currently in us
e, the measure that is presently proposed is independent of training s
ets or existing database information, and can thus find general applic
ation. Availability: A computer program Gene Scan which locates coding
open reading frames and exonic regions in genomic sequences has been
developed, and is available on request.