The high success of the "uncertified" mass spectrometry spectral collection
started in 1956 demonstrated qualitatively that a partial reference mass s
pectrum, even one measured routinely, can be of real value. Correct matchin
gs were still possible despite reference errors, which almost never led to
close matches that were incorrect. This study shows quantitatively that the
number of different compounds, not the number of peaks in a spectrum, is b
y far the most important determinant of database efficiency for identifying
a "global" unknown. A statistical evaluation of matching performance shows
that only 6, 12, and 18 peaks in a reference spectrum are 13%, 67%, and 96
%, respectively, as valuable as hundreds of peaks. Also, a separately measu
red second spectrum of the same compound is 50% as valuable as the first. D
atabase expansion that tripled the number of possible wrong answers only re
duced the proportion of correct identifications by 5%. Corrections of a mas
s or abundance error in each of six reference spectra increase the database
matching performance by as much as the addition of one spectrum of a new c
ompound. A new "matching quality index" based statistically on these values
indicates that the largest database is also by far the most effective for
matching unknowns. (C) 1999 American Society for Mass Spectrometry.