Se. Stein et Dr. Scott, OPTIMIZATION AND TESTING OF MASS-SPECTRAL LIBRARY SEARCH ALGORITHMS FOR COMPOUND IDENTIFICATION, Journal of the American Society for Mass Spectrometry, 5(9), 1994, pp. 859-866
Five algorithms proposed in the literature for library search identifi
cation of unknown compounds from their low resolution mass spectra wer
e optimized and tested by matching test spectra against reference spec
tra in the NIST-EPA-NIH Mass Spectral Database. The algorithms were pr
obability-based matching (PBM), dot-product, Hertz et al. similarity i
ndex, Euclidean distance, and absolute value distance. The test set co
nsisted of 12,592 alternate spectra of about 8000 compounds represente
d in the database. Most algorithms were optimized by varying their mas
s weighting and intensity scaling factors. Rank in the list of candida
te compounds was used as the criterion for accuracy. The best performi
ng algorithm (75% accuracy for rank 1) was the dot-product function th
at measures the cosine of the angle between spectra represented as vec
tors. Other methods in order of performance were the Euclidean distanc
e (72%), absolute value distance (68%), PBM (65%), and Hertz et al. (6
4%). Intensity scaling and mass weighting were important in the optimi
zed algorithms with the square root of the intensity scale nearly opti
mal and the square or cube the best mass weighting power. Several more
complex schemes also were tested, but had little effect on the result
s. A modest improvement in the performance of the dot-product algorith
m was made by adding a term that gave additional weight to relative pe
ak intensities for spectra with many peaks in common.