Motivation: In order to extract protein sequences from nucleotide sequences
, it is an important step to recognize points at which regions start that c
ode for proteins. These points are called translation initiation sites (TIS
).
Results: The task of finding TIS can be modeled as a classification problem
. We demonstrate the applicability of support vector machines for this task
, and show how to incorporate prior biological knowledge by engineering an
appropriate kernel function. With the described techniques the recognition
performance can be improved by 26% over leading existing approaches. We pro
vide evidence that existing related methods (e.g. ESTScan) could profit fro
m advanced TIS recognition.