We consider two models of on-line learning of binary-valued functions
from drifting distributions due to Bartlett. We show that if each exam
ple is drawn from a joint distribution which changes in total variatio
n distance by at most O(epsilon(3)/(d log(1/epsilon))) between trials,
then an algorithm can achieve a probability of a mistake at most epsi
lon worse than the best function in a class of VC-dimension d. We prov
e a corresponding necessary condition of O(epsilon(3)/d). Finally, in
the case that a fixed function is to be learned from noise-free exampl
es, we show that if the distributions on the domain generating the exa
mples change by at most O(epsilon(2)/(d log(1/epsilon))), then any con
sistent algorithm learns to within accuracy epsilon. (C) 1997 Academic
Press.