Peptide sequencing via tandem mass spectrometry (MS/MS) is one of the most
powerful tools in proteomics for identifying proteins. Because complete gen
ome sequences are accumulating rapidly, the recent trend in interpretation
of MS/MS spectra has been database search. However, de novo MS/MS spectral
interpretation remains an open problem typically involving manual interpret
ation by expert mass spectrometrists. We have developed a new algorithm, SH
ERENGA, for de novo interpretation that automatically learns fragment ion t
ypes and intensity thresholds from a collection of test spectra generated f
rom any type of mass spectrometer, The test data are used to construct opti
mal path scoring in the graph representations of MS/MS spectra, A ranked li
st of high scoring paths corresponds to potential peptide sequences. SHEREN
GA is most useful for interpreting sequences of peptides resulting from unk
nown proteins and for validating the results of database search algorithms
in fully automated, high-throughput peptide sequencing.