DISCOVERING NEURAL NETS WITH LOW KOLMOGOROV COMPLEXITY AND HIGH GENERALIZATION CAPABILITY

Authors
Citation
J. Schmidhuber, DISCOVERING NEURAL NETS WITH LOW KOLMOGOROV COMPLEXITY AND HIGH GENERALIZATION CAPABILITY, Neural networks, 10(5), 1997, pp. 857-873
Citations number
88
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Artificial Intelligence",Neurosciences,"Physics, Applied
Journal title
ISSN journal
08936080
Volume
10
Issue
5
Year of publication
1997
Pages
857 - 873
Database
ISI
SICI code
0893-6080(1997)10:5<857:DNNWLK>2.0.ZU;2-A
Abstract
Many neural net learning algorithms aim at finding ''simple'' nets to explain training data. The expectation is that the ''simpler'' the net works, the better the generalization on test data ( --> Occam's razor) . Previous implementations, however use measures for ''simplicity'' th at lack the power, universality and elegance of those based on Kolmogo rov complexity and Solomonoff's algorithmic probability. Likewise, mos t previous approaches (especially those of the the ''Bayesian'' kind) suffer from the problem of choosing appropriate priors. This paper add resses both issues, It first reviews some basic concepts of algorithmi c complexity theory relevant to machine learning, and how the Solomono ff-Levin distribution (or universal prior) deals with the prior proble m. The universal prior leads to a probabilistic method for finding ''a lgorithmically simple'' problem solutions with high generalization cap ability. The method is based on Levin complexity (a time-bounded gener alization of Kolmogorov complexity) and inspired by Levin's optimal un iversal search algorithm. For a given problem, solution candidates are computed by efficient ''self-sizing'' programs that influence their o wn runtime and storage size. The probabilistic search algorithm finds the ''good'' programs (the ones quickly computing algorithmically prob able solutions fitting the training data). Simulations focus on the ta sk of discovering ''algorithmically simple'' neural networks with low Kolmogorov complexity and high generalization capability. It is demons trated that the method, at least with certain toy problems where it is computationally feasible, can lead to generalization results unmatcha ble by previous neural network algorithms, Much remains to be done, ho wever, to make large scale applications and ''incremental learning'' f easible. (C) 1997 Elsevier Science Ltd.