Consider the problem of classifying a sample X(0) into one of two clas
ses, using a training set Q. Let Q be composed of l labeled samples {(
X(1), theta(1)), ..., (X(l), theta(l))} and u unlabeled samples {X'(1)
, ..., X'(u)}, where the labels theta(i) are i.i.d. Bernoulli(eta) ran
dom variables over the set {1, 2}, the observations {X(i)}(l)(i=1) are
distributed according to f(theta i)(.) and the unlabeled observations
{X'(j)}(u)(j=l) are independently distributed according to the mixtur
e density f(X')(.) = nf(1)(.)+(1-eta)f(.). We assume that f(1)(.), f(2
)(.) and eta are all unknown. Let f(1)(.) and f(2)(.) belong to a know
n family F, and assume that the mixtures of elements of F are identifi
able. Even when the number of unlabeled samples is infinite and the de
cision regions can therefore be identified, one still needs labeled sa
mples to label the decision regions with the correct classification. L
etting R(l, u) denote the optimal probability of error for l labeled a
nd u unlabeled samples, and assuming that the pairwise mixtures of F a
re identifiable, we obtain the obvious statements R(0, u)=R(0, infinit
y) = 1 /2, R(1, 0) less than or equal to 2 eta eta, R(infinity, u) =R
, and then prove R(1, infinity) = 2R(1-R*), where R* is the Bayes pro
bability of error, and R(l, infinity) = R+exp{-alpha l+o(l)}, where t
he exponent alpha is given by -log(2 root eta eta integral root f(1)(x
)f(2)(x)dx). Thus the first labeled sample reduces the risk from 1/2 t
o 2R(1-R*) and subsequent labeled samples in the training set reduce
the probability of error exponentially fast to the Bayes risk.