A general methodology for evaluating the accuracy of the results produced b
y scientific software has been developed at the National Physical Laborator
y. The basis of the approach is the design and use of reference data sets a
nd corresponding reference results to undertake black-box testing.
The approach enables reference data sets and results to be generated in a m
anner consistent with the functional specification of the problem addressed
by the software. The results returned by the software for the reference da
ta are compared objectively with the reference results. Quality metrics are
used for this purpose that account for the key aspects of the problem.
In this paper it is shown how reference data sets can be designed for testi
ng software implementations of solutions to a broad class of problems arisi
ng throughout science. It is shown how these data sets can be used in pract
ice and how the results provided by software under test can properly be com
pared with reference results. The approach is illustrated with three exampl
es: (i) mean and standard deviation, (ii) straight-line fitting, and (iii)
principal components analysis. Software for such problems is used routinely
in many fields, including optical spectrometry. (C) 1999 The National Phys
ical Laboratory Published by Elsevier Science B.V. All rights reserved.