ITA
ENG

Protein-coding regions prediction combining similarity searches and conservative evolutionary properties of protein-coding sequences

Authors

Rogozin, IB D'Angelo, D Milanesi, L

Citation

Ib. Rogozin et al., Protein-coding regions prediction combining similarity searches and conservative evolutionary properties of protein-coding sequences, GENE, 226(1), 1999, pp. 129-137

Citations number

Categorie Soggetti

Molecular Biology & Genetics

Journal title

GENE

ISSN journal

03781119 → ACNP

Volume

226

Issue

Year of publication

1999

Pages

129 - 137

Database

ISI

SICI code

0378-1119(19990108)226:1<129:PRPCSS>2.0.ZU;2-6

Abstract

The gene identification procedure in a completely new gene with no good hom ology with protein sequences can be a very complex task. In order to identi fy the protein-coding region, a new method, 'SYNCOD', based on the analysis of conservative evolutionary properties of coding regions, has been realiz ed. This program is able to identify and use the coding region homologies o f the non-annotated (unknown) protein-coding sequences already present in t he nucleotide sequence databases by using the alignment produced by BLASTN. The ratio of number mismatches resulting in synonymous codons to the numbe r of mismatches resulting in non-synonymous codons is estimated for each op en reading frame. Monte Carlo simulations are then used to estimate the sig nificance of the ratio deviation from random behavior. The SYNCOD program h as been tested on generated random sequences and on different control sets. The high accuracy of predicting protein-coding regions (the correlation co efficient, CC, varies from 0.67 to 0.79) and the high specificity (the port ion of wrong exons, WE, varies from 0.06 to 0.07) have proved to be importa nt features of the suggested approach. The SYNCOD program is resident on th e ITBA-CNR Web Server and can be used via the Internet (URL: www.itba.mi.cn r.it/webgene). (C) 1999 Elsevier Science B.V. All rights reserved.