SCALING-UP INDUCTIVE LEARNING WITH MASSIVE PARALLELISM

Citation
Fj. Provost et Jm. Aronis, SCALING-UP INDUCTIVE LEARNING WITH MASSIVE PARALLELISM, Machine learning, 23(1), 1996, pp. 33-46
Citations number
42
Categorie Soggetti
Computer Sciences","Computer Science Artificial Intelligence",Neurosciences
Journal title
ISSN journal
08856125
Volume
23
Issue
1
Year of publication
1996
Pages
33 - 46
Database
ISI
SICI code
0885-6125(1996)23:1<33:SILWMP>2.0.ZU;2-3
Abstract
Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infreq uent special cases. Current inductive learners perform well with hundr eds or thousands of training examples, but in some cases, up to a mill ion or more examples may be necessary to learn important special cases with confidence. These tasks are infeasible for current learning prog rams running on sequential machines. We discuss the need for very larg e data sets and prior efforts to scale up machine learning methods. Th is discussion motivates a strategy that exploits the inherent parallel ism present in many learning algorithms. We describe a parallel implem entation of one inductive learning program on the CM-2 Connection Mach ine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larg er than about 10K examples. When learning from a public-health databas e consisting of 3.5 million examples, the parallel rule-learning syste m uncovered a surprising relationship that has led to considerable fol low-up research.