Machine learning programs need to scale up to very large data sets for
several reasons, including increasing accuracy and discovering infreq
uent special cases. Current inductive learners perform well with hundr
eds or thousands of training examples, but in some cases, up to a mill
ion or more examples may be necessary to learn important special cases
with confidence. These tasks are infeasible for current learning prog
rams running on sequential machines. We discuss the need for very larg
e data sets and prior efforts to scale up machine learning methods. Th
is discussion motivates a strategy that exploits the inherent parallel
ism present in many learning algorithms. We describe a parallel implem
entation of one inductive learning program on the CM-2 Connection Mach
ine, show that it scales up to millions of examples, and show that it
uncovers special-case rules that sequential learning programs, running
on smaller datasets, would miss. The parallel version of the learning
program is preferable to the sequential version for example sets larg
er than about 10K examples. When learning from a public-health databas
e consisting of 3.5 million examples, the parallel rule-learning syste
m uncovered a surprising relationship that has led to considerable fol
low-up research.