ITA
ENG

Accelerating EM for large databases

Authors

Thiesson, B Meek, C Heckerman, D

Citation

B. Thiesson et al., Accelerating EM for large databases, MACH LEARN, 45(3), 2001, pp. 279-299

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

MACHINE LEARNING

ISSN journal

08856125 → ACNP

Volume

Issue

Year of publication

2001

Pages

279 - 299

Database

ISI

SICI code

0885-6125(2001)45:3<279:AEFLD>2.0.ZU;2-X

Abstract

The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often require s significant computational resources and has been dismissed as impractical for large databases. We present two approaches that significantly reduce t he computational cost of applying the EM algorithm to databases with a larg e number of cases, including databases with large dimensionality. Both appr oaches are based on partial E-steps for which we can use the results of Nea l and Hinton (In Jordan, M. (Ed.), Learning in Graphical Models, pp. 355-37 1. The Netherlands: Kluwer Academic Publishers) to obtain the standard conv ergence guarantees of EM. The first approach is a version of the incrementa l EM algorithm, described in Neal and Hinton (1998), which cycles through d ata cases in blocks. The number of cases in each block dramatically effects the efficiency of the algorithm. We provide a method for selecting a near optimal block size. The second approach, which we call lazy EM, will, at sc heduled iterations, evaluate the significance of each data case and then pr oceed for several iterations actively using only the significant cases. We demonstrate that both methods can significantly reduce computational costs through their application to high-dimensional real-world and synthetic mixt ure modeling problems for large databases.