Some extensions are considered of Gold's influential model of language lear
ning by machine from positive data. Studied are criteria of successful lear
ning featuring convergence in the limit to vacillation between several alte
rnative correct grammars. The main theorem of this paper is that there are
classes of languages that can be learned if convergence in the limit to up
to (n + 1) exactly correct grammars is allowed but which cannot be learned
if convergence in the limit is to no more than n grammars, where the no mor
e than n grammars can each make finitely many mistakes. This contrasts shar
ply with results of Barzdin and Podnieks and, later, Case and Smith for lea
rnability from both positive and negative data.
A subset principle from a 1980 paper of Angluin is extended to the vacillat
ory and other criteria of this paper. This principle provides a necessary c
ondition for avoiding overgeneralization in learning from positive data. It
is applied to prove another theorem to the effect that one can optimally e
liminate half of the mistakes from final programs for vacillatory criteria
if one is willing to converge in the limit to infinitely many different pro
grams instead.
Child language learning may be sensitive to the order or timing of data pre
sentation. It is shown, though, that for the vacillatory success criteria o
f this paper, there is no loss of learning power for machines which are ins
ensitive to order in several ways simultaneously. For example, partly set-d
riven machines attend only to the set and length of sequence of positive da
ta, not the actual sequence itself. A machine M is weakly n-ary order indep
endent def, for each language L on which, for some ordering of the positive
data about L, M converges in the limit to a finite set of grammars, there
is a finite set of grammars D (of cardinality less than or equal to n) such
that M converges to a subset of this same D for each ordering of the posit
ive data for L. The theorem most difficult to prove in the paper implies th
at machines which are simultaneously partly set-driven and weakly n-ary ord
er independent do not lose learning power for converging in the limit to up
to n grammars. Several variants of this theorem are obtained by modifying
its proof, and some of these variants have application in this and other pa
pers. Along the way it is also shown, for the vacillatory criteria, that le
arning power is not increased if one restricts the sequence of positive dat
a presentation to be computable. Some of these results are nontrivial lifts
of prior work for the n = 1 case done by the Blums; Wiehagen; Osherson, St
ob, and Weinstein; Schafer; and Fulk.