Protein-coding region discovery in organisms underrepresented in databases

Citation
Y. Quentin et al., Protein-coding region discovery in organisms underrepresented in databases, COMPUT CHEM, 23(3-4), 1999, pp. 209-217
Citations number
29
Categorie Soggetti
Chemistry
Journal title
COMPUTERS & CHEMISTRY
ISSN journal
00978485 → ACNP
Volume
23
Issue
3-4
Year of publication
1999
Pages
209 - 217
Database
ISI
SICI code
0097-8485(1999)23:3-4<209:PRDIOU>2.0.ZU;2-A
Abstract
The prediction of coding sequences has received a lot of attention during t he last decade. We can distinguish two kinds of methods, those that rely on training with sets of example and counter-example sequences, and those tha t exploit the intrinsic properties of the DNA sequences to be analyzed. The former are generally more powerful but their domains of application are li mited by the availability of a training set. The latter avoid this drawback but can only be applied to sequences that are long enough to allow computa tion of the statistics. Here, we present a method that fills the gap betwee n the two approaches. A learning step is applied using a set of sequences t hat are assumed to contain coding and non-coding regions, but with the boun daries of these regions unknown. A test step then uses the discriminant fun ction obtained during the learning to predict coding regions in sequences f rom the same organism. The learning relies upon a correspondence analysis a nd prediction is presented on a graphical display. The method has been eval uated on a sample of yeast sequences, and the analysis of a set of expresse d sequence tags from the Eucalyptus globulus-Pisolithus tinctorius ectomyco rrhiza illustrates the relevance of the approach in its biological context. (C) 1999 Elsevier Science Ltd. All rights reserved.