OPTIMIZATION AND TESTING OF MASS-SPECTRAL LIBRARY SEARCH ALGORITHMS FOR COMPOUND IDENTIFICATION

Authors
Citation
Se. Stein et Dr. Scott, OPTIMIZATION AND TESTING OF MASS-SPECTRAL LIBRARY SEARCH ALGORITHMS FOR COMPOUND IDENTIFICATION, Journal of the American Society for Mass Spectrometry, 5(9), 1994, pp. 859-866
Citations number
21
Categorie Soggetti
Chemistry Physical","Chemistry Analytical",Spectroscopy
ISSN journal
10440305
Volume
5
Issue
9
Year of publication
1994
Pages
859 - 866
Database
ISI
SICI code
1044-0305(1994)5:9<859:OATOML>2.0.ZU;2-S
Abstract
Five algorithms proposed in the literature for library search identifi cation of unknown compounds from their low resolution mass spectra wer e optimized and tested by matching test spectra against reference spec tra in the NIST-EPA-NIH Mass Spectral Database. The algorithms were pr obability-based matching (PBM), dot-product, Hertz et al. similarity i ndex, Euclidean distance, and absolute value distance. The test set co nsisted of 12,592 alternate spectra of about 8000 compounds represente d in the database. Most algorithms were optimized by varying their mas s weighting and intensity scaling factors. Rank in the list of candida te compounds was used as the criterion for accuracy. The best performi ng algorithm (75% accuracy for rank 1) was the dot-product function th at measures the cosine of the angle between spectra represented as vec tors. Other methods in order of performance were the Euclidean distanc e (72%), absolute value distance (68%), PBM (65%), and Hertz et al. (6 4%). Intensity scaling and mass weighting were important in the optimi zed algorithms with the square root of the intensity scale nearly opti mal and the square or cube the best mass weighting power. Several more complex schemes also were tested, but had little effect on the result s. A modest improvement in the performance of the dot-product algorith m was made by adding a term that gave additional weight to relative pe ak intensities for spectra with many peaks in common.