DATA-BASED CHOICE OF HISTOGRAM BIN WIDTH

Authors
Citation
Mp. Wand, DATA-BASED CHOICE OF HISTOGRAM BIN WIDTH, The American statistician, 51(1), 1997, pp. 59-64
Citations number
25
Categorie Soggetti
Statistic & Probability","Statistic & Probability
Journal title
ISSN journal
00031305
Volume
51
Issue
1
Year of publication
1997
Pages
59 - 64
Database
ISI
SICI code
0003-1305(1997)51:1<59:DCOHBW>2.0.ZU;2-J
Abstract
The most important parameter of a histogram is the bin width because i t controls the tradeoff between presenting a picture with too much det ail (''undersmoothing'') or too little detail (''oversmoothing'') with respect to the true distribution. Despite this importance there has b een surprisingly little research into estimation of the ''optimal'' bi n width. Default bin widths in most common statistical packages are, a t least for large samples, quite far from the optimal bin width. Rules proposed by, for example, Scott lead to better large sample performan ce of the histogram, but are not consistent themselves. In this paper we extend the bin width rules of Scott to those that achieve root-n ra tes of convergence to the L(2)-optimal bin width, thereby providing fi rm scientific justification for their use. Moreover, the proposed rule s are simple, easy and fast to compute, and perform well in simulation s.