ITA
ENG

Committee-based sample selection for probabilistic classifiers

Authors

Argamon-Engelson, S Dagan, I

Citation

S. Argamon-engelson et I. Dagan, Committee-based sample selection for probabilistic classifiers, J ARTIF I R, 11, 1999, pp. 335-360

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH

ISSN journal

10769757 → ACNP

Volume

Year of publication

1999

Pages

335 - 360

Database

ISI

SICI code

1076-9757(1999)11:<335:CSSFPC>2.0.ZU;2-D

Abstract

In many real-world learning tasks it is expensive to acquire a sufficient n umber of labeled examples for training. This paper investigates methods for reducing annotation cost by sample selection. In this approach, during tra ining the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids r edundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, and extends th e committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample select ion in probabilistic classification models, which evaluate the informativen ess of an example by measuring the degree of disagreement between several m odel variants. These variants (the committee) are drawn randomly from a pro bability distribution conditioned by the training set labeled so far. The method was applied to the real-world natural language processing task o f stochastic part-of-speech tagging. We find that all variants of the metho d achieve a significant reduction in annotation cost, although their comput ational efficiency differs. In particular, the simplest variant, a two memb er committee with no parameters to tune, gives excellent results. We also s how that sample selection yields a significant reduction in the size of the model used by the tagger.