Conventional methods of gene prediction rely on the recognition of DNA-sequ
ence signals, the coding potential or the comparison of a genomic sequence
with a cDNA, EST, or protein database. Reasons for limited accuracy in many
circumstances are species-specific training and the incompleteness of refe
rence databases. Lately, comparative genome analysis has attracted increasi
ng attention. Several analysis tools that are based oil human/mouse compari
sons are already available. Here, we present a program for the prediction o
f protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is b
ased oil the similarity of homologous genomic sequences. In contrast to mos
t existing tools, the accuracy Of SGP-1 depends little oil species-specific
properties such as codon usage or the nucleotide distribution. SGP-1 may t
herefore be applied to nonstandard model organisms in vertebrates as well a
s in Plants, Without the need for extensive parameter training. In addition
to predicting genes in large-scale genomic sequences, the program may be u
seful to validate gene structure annotations from databases. To this end, S
GP-1 Output also contains comparisons between predicted and annotated gene
structures in HTML format. The program call be accessed via a Web server at
http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is avail
able oil request from the authors.