Costs of general purpose learning

Citation
J. Case et al., Costs of general purpose learning, THEOR COMP, 259(1-2), 2001, pp. 455-473
Citations number
19
Categorie Soggetti
Computer Science & Engineering
Journal title
THEORETICAL COMPUTER SCIENCE
ISSN journal
03043975 → ACNP
Volume
259
Issue
1-2
Year of publication
2001
Pages
455 - 473
Database
ISI
SICI code
0304-3975(20010528)259:1-2<455:COGPL>2.0.ZU;2-7
Abstract
Leo Harrington surprisingly constructed a machine which can learn any compu table function f according to the following criterion (called Be*-identific ation). His machine, on the successive graph points of f, outputs a corresp onding infinite sequence of programs p(0), p(1), p(2)..., and, for some i, the programs p(i), p(i+1), p(i+2),... each compute a variant of f which dif fers from f at only finitely many argument places. A machine with this prop erty is called general purpose. The sequence p(i), p(i+1), p(i+2),... is ca lled a final sequence. For Harrington's general purpose machine, for distin ct m and n, the finitely many argument places where p(i+m), fails to comput e f can be very different from the finitely many argument places where p(in) fails to compute f. One would hope though, that if Harrington's machine, or an improvement thereof, inferred the program p(i+m) based on the data p oints f(0), f(1),..., f(k), then p(i+m) would make very few mistakes comput ing f at the "near future" arguments k + 1,k + 2,...,k + l, where l is reas onably large. Ideally, p(i+m)'s finitely many mistakes or anomalies would ( mostly) occur at arguments x much greater than k, i.e., ideally, its anomal ies would be well placed beyond near future arguments. In the present paper , for general purpose learning machines, it is analyzed just how well or ba dly placed these anomalies may be with respect to near future arguments and what are the various tradeoffs. In particular, there is good news and bad. Bad news is that, for any learning machine M (including general purpose M) , for all m, there exist infinitely many computable functions f such that, infinitely often M incorrectly predicts f's next m near future values. Good news is that, for a suitably clever general purpose learning machine M, fo r each computable f, for M on f, the density such associated bad prediction intervals of size m is vanishingly small. Considered too is the possibilit y of providing a general purpose learner which additionally learns some int eresting classes with respect to much stricter criteria than Bc*-identifica tion, Again there is good news and bad. The criterion of finite identificat ion requires for success that a learner M on a function i. output exactly o ne program which correctly computes f. BCn-identification is just like Bc*- identification above except that the number of anomalies in each program of a final sequence is less than or equal to n. Bad news is that there is a f initely identifiable class of computable functions G such that for no gener al purpose learner M and for no n, does M additionally, Bc(n)-identify G. E x-identification by M on f requires that M on S converges, after a few outp ut programs, to a single final program which computes f. A reliable learner (by definition) never deceives by false convergence; more precisely: whene ver it converges to a final program on a function f, it must Ex-identify f. Good news is that, for any class G that can be reliably Ex-identified, the re is a general purpose machine which additionally Ex-identifies G! (C) 200 1 Elsevier Science B.V. All rights reserved.