EFFICIENT RANDOMIZED ALGORITHMS FOR THE REPEATED MEDIAN LINE ESTIMATOR

Citation
J. Matousek et al., EFFICIENT RANDOMIZED ALGORITHMS FOR THE REPEATED MEDIAN LINE ESTIMATOR, Algorithmica, 20(2), 1998, pp. 136-150
Citations number
37
Categorie Soggetti
Computer Sciences",Mathematics,Mathematics,"Computer Science Software Graphycs Programming
Journal title
ISSN journal
01784617
Volume
20
Issue
2
Year of publication
1998
Pages
136 - 150
Database
ISI
SICI code
0178-4617(1998)20:2<136:ERAFTR>2.0.ZU;2-S
Abstract
The problem of fitting a straight line to a finite collection of point s in the plane is an important problem in statistical estimation. Rece ntly there has been a great deal of interest is robust estimators, bec ause of their lack of sensitivity to outlying data points. The basic m easure of the robustness of an estimator is its breakdown point, that is, the fraction (up to 50%) of outlying data points that can corrupt the estimator, One problem with robust estimators is that achieving hi gh breakdown points (near 50%) has proved to be computationally demand ing. In this paper we present the best known theoretical algorithm and a practical subquadratic algorithm for computing a 50% breakdown poin t line estimator, the Siegel or repeated median line estimator. We fir st present an O(n log n) randomized expected-time algorithm, where la is the number of given points. This algorithm relies, however, on soph isticated data structures. We also present a very simple O(n log(2) n) randomized algorithm for this problem, which uses no complex data str uctures. We provide empirical evidence that, for many realistic input distributions, the running time of this second algorithm is actually O (n log n) expected time.