ITA
ENG

OPTIMIZATION AND TESTING OF MASS-SPECTRAL LIBRARY SEARCH ALGORITHMS FOR COMPOUND IDENTIFICATION

Authors

STEIN SE SCOTT DR

Citation

Se. Stein et Dr. Scott, OPTIMIZATION AND TESTING OF MASS-SPECTRAL LIBRARY SEARCH ALGORITHMS FOR COMPOUND IDENTIFICATION, Journal of the American Society for Mass Spectrometry, 5(9), 1994, pp. 859-866

Citations number

Categorie Soggetti

Chemistry Physical","Chemistry Analytical",Spectroscopy

Journal title

Journal of the American Society for Mass Spectrometry → ACNP

ISSN journal

10440305

Volume

Issue

Year of publication

1994

Pages

859 - 866

Database

ISI

SICI code

1044-0305(1994)5:9<859:OATOML>2.0.ZU;2-S

Abstract

Five algorithms proposed in the literature for library search identifi cation of unknown compounds from their low resolution mass spectra wer e optimized and tested by matching test spectra against reference spec tra in the NIST-EPA-NIH Mass Spectral Database. The algorithms were pr obability-based matching (PBM), dot-product, Hertz et al. similarity i ndex, Euclidean distance, and absolute value distance. The test set co nsisted of 12,592 alternate spectra of about 8000 compounds represente d in the database. Most algorithms were optimized by varying their mas s weighting and intensity scaling factors. Rank in the list of candida te compounds was used as the criterion for accuracy. The best performi ng algorithm (75% accuracy for rank 1) was the dot-product function th at measures the cosine of the angle between spectra represented as vec tors. Other methods in order of performance were the Euclidean distanc e (72%), absolute value distance (68%), PBM (65%), and Hertz et al. (6 4%). Intensity scaling and mass weighting were important in the optimi zed algorithms with the square root of the intensity scale nearly opti mal and the square or cube the best mass weighting power. Several more complex schemes also were tested, but had little effect on the result s. A modest improvement in the performance of the dot-product algorith m was made by adding a term that gave additional weight to relative pe ak intensities for spectra with many peaks in common.