The choice of clinical cases used to train and test a computer-aided d
iagnosis (CAD) scheme can affect the test results (i.e., error rate).
In this study, we deliberately modified the components of our testing
database to study the effects of this modification on measured perform
ance. Using a computerized scheme for the automated detection of breas
t masses from mammograms, it was found that the sensitivity of the sch
eme ranged between 26% and 100% (at a false positive rate of 1.0 per i
mage) depending on the cases used to test the scheme. Even a 20% chang
e in the cases comprising the database can reduce the measured sensiti
vity by 15%-25%. Because of the strong dependence of measured performa
nce on the testing database, it is difficult to estimate reliably the
accuracy of a CAD scheme. Furthermore, it is questionable to compare d
ifferent CAD schemes when different cases are used for testing. Sharin
g databases, creating a common database, or using a quantitative measu
re to characterize databases are possible solutions to this problem. H
owever, none of these solutions exists or is practiced at present. The
refore, as a short-term solution, it is recommended that the method us
ed for selecting cases, and histograms or mean and standard deviations
of relevant image features be reported whenever performance data are
presented.