CLASSIFICATION-ALGORITHM EVALUATION - 5 PERFORMANCE-MEASURES BASED ONCONFUSION MATRICES

Authors
Citation
Ad. Forbes, CLASSIFICATION-ALGORITHM EVALUATION - 5 PERFORMANCE-MEASURES BASED ONCONFUSION MATRICES, Journal of clinical monitoring, 11(3), 1995, pp. 189-206
Citations number
NO
Categorie Soggetti
Medical Laboratory Technology
ISSN journal
07481977
Volume
11
Issue
3
Year of publication
1995
Pages
189 - 206
Database
ISI
SICI code
0748-1977(1995)11:3<189:CE-5PB>2.0.ZU;2-G
Abstract
Objective. The objective of this paper is to introduce, explain, and e xtend methods for comparing the performance of classification algorith ms using error tallies obtained on properly sized, populated, and labe led data sets. Methods. Two distinct contexts of classification are de fined, involving ''objects-by-inspection'' and ''objects-by-segmentati on.'' In the former context, the total number of objects to be classif ied is unambiguously and self-evidently defined. In the latter, there is troublesome ambiguity. All five of the measures of performance here considered are based on confusion matrices, tables of counts revealin g the extent of an algorithm's ''confusion'' regarding the true classi fications. A proper measure of classification-algorithm performance mu st meet four requirements. A proper measure should obey six additional constraints. Results. Four traditional measures of performance are cr itiqued in terms of the requirements and constraints. Each measure mee ts the requirements, but fails to obey at least one of the constraints . A nontraditional measure of algorithm performance, the normalized mu tual information (NMI), is therefore introduced. Based on the NMI, met hods for comparing algorithm performance using confusion matrices are devised. Conclusions. The five performance measures lead to similar in ferences when comparing a trio of QRS-detection algorithms using a lar ge data set. The modified NMI is preferred, however, because it obeys each of the constraints and is the most conservative measure of perfor mance.