Dn. Perkins et al., Probability-based protein identification by searching sequence databases using mass spectrometry data, ELECTROPHOR, 20(18), 1999, pp. 3551-3567
Several algorithms have been described in the literature for protein identi
fication by searching a sequence database using mass spectrometry data. In
some approaches, the experimental data are peptide molecular weights from t
he digestion of a protein by an enzyme. Other approaches use tandem mass sp
ectrometry (MS/MS) data from one or more peptides. Still others combine mas
s data with amino acid sequence data. We present results from a new compute
r program, Mascot, which integrates all three types of search. The scoring
algorithm is probability based, which has a number of advantages: (i) A sim
ple rule can be used to judge whether a result is significant or not. This
is particularly useful in guarding against false positives. (ii) Scores can
be compared with those from other types of search, such as sequence homolo
gy. (iii) Search parameters can be readily optimised by iteration. The stre
ngths and limitations of probability-based scoring are discussed, particula
rly in the context of high throughput, fully automated protein identificati
on.