The emperor's new tests

Authors
Citation
Md. Perlman et L. Wu, The emperor's new tests, STAT SCI, 14(4), 1999, pp. 355-369
Citations number
66
Categorie Soggetti
Mathematics
Journal title
STATISTICAL SCIENCE
ISSN journal
08834237 → ACNP
Volume
14
Issue
4
Year of publication
1999
Pages
355 - 369
Database
ISI
SICI code
0883-4237(199911)14:4<355:TENT>2.0.ZU;2-7
Abstract
In the past two decades, striking examples of allegedly inferior likelihood ratio tests (LRT) have appeared in the statistical literature. These examp les, which arise in multiparameter hypothesis testing problems, have severa l common features. In each case the null hypothesis is composite, the size a LRT is not similar and hence biased, and competing size ct tests can be c onstructed that are less biased, or even unbiased, and that dominate the LR T in the sense of being everywhere more powerful. It is therefore asserted that in these examples and, by implication, many other testing problems, th e LR criterion produces "inferior," "deficient," "undesirable," or "flawed" statistical procedures. This message, which appears to be proliferating, is wrong. In each example it is the allegedly superior test that is flawed, not the LRT. At worst, th e "superior" tests provide unwarranted and inappropriate inferences and hav e been deemed scientifically unacceptable by applied statisticians. This re inforces the well-documented but oft-neglected fact that the Neyman-Pearson theory desideratum of a more (or most) powerful size ct test may be scient ifically inappropriate; the same is true for the criteria of unbiasedness a nd ct-admissibility. Although the LR criterion is not infallible, we believ e that it remains a. generally reasonable first option for non-Bayesian par ametric hypothesis-testing problems.