Probability-based protein identification by searching sequence databases using mass spectrometry data

Citation
Dn. Perkins et al., Probability-based protein identification by searching sequence databases using mass spectrometry data, ELECTROPHOR, 20(18), 1999, pp. 3551-3567
Citations number
33
Categorie Soggetti
Chemistry & Analysis
Journal title
ELECTROPHORESIS
ISSN journal
01730835 → ACNP
Volume
20
Issue
18
Year of publication
1999
Pages
3551 - 3567
Database
ISI
SICI code
0173-0835(199912)20:18<3551:PPIBSS>2.0.ZU;2-Y
Abstract
Several algorithms have been described in the literature for protein identi fication by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from t he digestion of a protein by an enzyme. Other approaches use tandem mass sp ectrometry (MS/MS) data from one or more peptides. Still others combine mas s data with amino acid sequence data. We present results from a new compute r program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A sim ple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homolo gy. (iii) Search parameters can be readily optimised by iteration. The stre ngths and limitations of probability-based scoring are discussed, particula rly in the context of high throughput, fully automated protein identificati on.