ITA
ENG

THE RELATIVE VALUE OF LABELED AND UNLABELED SAMPLES IN PATTERN-RECOGNITION WITH AN UNKNOWN MIXING PARAMETER

Authors

CASTELLI V COVER TM

Citation

V. Castelli et Tm. Cover, THE RELATIVE VALUE OF LABELED AND UNLABELED SAMPLES IN PATTERN-RECOGNITION WITH AN UNKNOWN MIXING PARAMETER, IEEE transactions on information theory, 42(6), 1996, pp. 2102-2117

Citations number

Categorie Soggetti

Information Science & Library Science","Engineering, Eletrical & Electronic

Journal title

IEEE transactions on information theory → ACNP

ISSN journal

00189448

Volume

Issue

Year of publication

1996

Part

Pages

2102 - 2117

Database

ISI

SICI code

0018-9448(1996)42:6<2102:TRVOLA>2.0.ZU;2-R

Abstract

We observe a training set Q composed of l labeled samples {(X(1).theta (1)),...(X(l),theta(l))} and u unlabeled samples {X'(1),...X'(u)}. The labels theta(i) are independent random variables satisfying Pr {theta 2--- = 1} = eta, Pr {theta(i) = 2} = 1 - eta. The labeled observation s X(2) are independently distributed with conditional density f(theta i)(.) given theta(2). Let (X(0), theta(0)) be a new sample, independen tly distributed as the samples in the training set. We observe X(0) an d we wish to infer the classification theta(0). In this paper we first assume that the distributions f(1)(.) and f(2)(.) are given and that the mixing parameter eta is unknown, We show that the relative value o f labeled and unlabeled samples in reducing the risk of optimal classi fiers is the ratio of the Fisher informations they carry about the par ameter eta. We then assume that two densities g(1)(.) and g(2)(.) are given, but we do not know whether g(1)(.) = f(1)(.) and g(2)(.) = f(2) (.) or if the opposite holds, nor do we know eta. Thus the learning pr oblem consists of both estimating the optimum partition of the observa tion space and assigning the classifications to the decision regions, Here, we show that labeled samples are necessary to construct a classi fication rule and that they are exponentially more valuable than unlab eled samples.