This paper presents a general, trainable system for object detection in unc
onstrained, cluttered scenes. The system derives much of its power from a r
epresentation that describes an object class in terms of an overcomplete di
ctionary of local, oriented, multiscale intensity differences between adjac
ent regions, efficiently computable as a Haar wavelet transform. This examp
le-based learning approach implicitly derives a model of an object class by
training a support vector machine classifier using a large set of positive
and negative examples. We present results on face, people, and car detecti
on tasks using the same architecture. In addition, we quantify how the repr
esentation affects detection performance by considering several alternate r
epresentations including pixels and principal components. We also describe
a real-time application of our person detection system as part of a driver
assistance system.