The prediction of coding sequences has received a lot of attention during t
he last decade. We can distinguish two kinds of methods, those that rely on
training with sets of example and counter-example sequences, and those tha
t exploit the intrinsic properties of the DNA sequences to be analyzed. The
former are generally more powerful but their domains of application are li
mited by the availability of a training set. The latter avoid this drawback
but can only be applied to sequences that are long enough to allow computa
tion of the statistics. Here, we present a method that fills the gap betwee
n the two approaches. A learning step is applied using a set of sequences t
hat are assumed to contain coding and non-coding regions, but with the boun
daries of these regions unknown. A test step then uses the discriminant fun
ction obtained during the learning to predict coding regions in sequences f
rom the same organism. The learning relies upon a correspondence analysis a
nd prediction is presented on a graphical display. The method has been eval
uated on a sample of yeast sequences, and the analysis of a set of expresse
d sequence tags from the Eucalyptus globulus-Pisolithus tinctorius ectomyco
rrhiza illustrates the relevance of the approach in its biological context.
(C) 1999 Elsevier Science Ltd. All rights reserved.