ITA
ENG

SCALING-UP INDUCTIVE LEARNING WITH MASSIVE PARALLELISM

Authors

PROVOST FJ ARONIS JM

Citation

Fj. Provost et Jm. Aronis, SCALING-UP INDUCTIVE LEARNING WITH MASSIVE PARALLELISM, Machine learning, 23(1), 1996, pp. 33-46

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Artificial Intelligence",Neurosciences

Journal title

Machine learning → ACNP

ISSN journal

08856125

Volume

Issue

Year of publication

1996

Pages

33 - 46

Database

ISI

SICI code

0885-6125(1996)23:1<33:SILWMP>2.0.ZU;2-3

Abstract

Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infreq uent special cases. Current inductive learners perform well with hundr eds or thousands of training examples, but in some cases, up to a mill ion or more examples may be necessary to learn important special cases with confidence. These tasks are infeasible for current learning prog rams running on sequential machines. We discuss the need for very larg e data sets and prior efforts to scale up machine learning methods. Th is discussion motivates a strategy that exploits the inherent parallel ism present in many learning algorithms. We describe a parallel implem entation of one inductive learning program on the CM-2 Connection Mach ine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larg er than about 10K examples. When learning from a public-health databas e consisting of 3.5 million examples, the parallel rule-learning syste m uncovered a surprising relationship that has led to considerable fol low-up research.