ASSESSING PROTEIN-CODING REGION INTEGRITY IN CDNA SEQUENCING PROJECTS

Citation
Aa. Salamov et al., ASSESSING PROTEIN-CODING REGION INTEGRITY IN CDNA SEQUENCING PROJECTS, BIOINFORMATICS, 14(5), 1998, pp. 384-390
Citations number
21
Categorie Soggetti
Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods
Journal title
ISSN journal
13674803
Volume
14
Issue
5
Year of publication
1998
Pages
384 - 390
Database
ISI
SICI code
1367-4803(1998)14:5<384:APRIIC>2.0.ZU;2-E
Abstract
Motivation: In cDNA sequencing projects, it is vital to know whether t he protein coding region of a sequence is complete, or whether errors have occurred during library construction here we present a linear dis criminant approach that predicts this completeness by estimating the p robability of each ATG being the initiation codon. Results: because of the current shortage of full-length cDNA data on which to base this w ork, tests were performed on a non-redundant set of 660 initiation cod on-containing DNA sequences that had been conceptually spliced into mR NA/cDNA. We also used an edited set of the same sequences that only co ntained the region following the initiation codon as a negative contro l. Using the criterion that only a single prediction is allowed for ea ch sequence, a cut-off was selected at which discrimination of both po sitive and negative sets was equal. At this cut-off, 67% of each set c ould be correctly distinguished, with the correct ATG codon also being identified in the positive set. Reliability could be increased furthe r by raising the cut-off or including homologues, the relative merits of which are discussed.