ITA
ENG

DISCOVERING NEURAL NETS WITH LOW KOLMOGOROV COMPLEXITY AND HIGH GENERALIZATION CAPABILITY

Authors

SCHMIDHUBER J

Citation

J. Schmidhuber, DISCOVERING NEURAL NETS WITH LOW KOLMOGOROV COMPLEXITY AND HIGH GENERALIZATION CAPABILITY, Neural networks, 10(5), 1997, pp. 857-873

Citations number

Categorie Soggetti

Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Artificial Intelligence",Neurosciences,"Physics, Applied

Journal title

Neural networks → ACNP

ISSN journal

08936080

Volume

Issue

Year of publication

1997

Pages

857 - 873

Database

ISI

SICI code

0893-6080(1997)10:5<857:DNNWLK>2.0.ZU;2-A

Abstract

Many neural net learning algorithms aim at finding ''simple'' nets to explain training data. The expectation is that the ''simpler'' the net works, the better the generalization on test data ( --> Occam's razor) . Previous implementations, however use measures for ''simplicity'' th at lack the power, universality and elegance of those based on Kolmogo rov complexity and Solomonoff's algorithmic probability. Likewise, mos t previous approaches (especially those of the the ''Bayesian'' kind) suffer from the problem of choosing appropriate priors. This paper add resses both issues, It first reviews some basic concepts of algorithmi c complexity theory relevant to machine learning, and how the Solomono ff-Levin distribution (or universal prior) deals with the prior proble m. The universal prior leads to a probabilistic method for finding ''a lgorithmically simple'' problem solutions with high generalization cap ability. The method is based on Levin complexity (a time-bounded gener alization of Kolmogorov complexity) and inspired by Levin's optimal un iversal search algorithm. For a given problem, solution candidates are computed by efficient ''self-sizing'' programs that influence their o wn runtime and storage size. The probabilistic search algorithm finds the ''good'' programs (the ones quickly computing algorithmically prob able solutions fitting the training data). Simulations focus on the ta sk of discovering ''algorithmically simple'' neural networks with low Kolmogorov complexity and high generalization capability. It is demons trated that the method, at least with certain toy problems where it is computationally feasible, can lead to generalization results unmatcha ble by previous neural network algorithms, Much remains to be done, ho wever, to make large scale applications and ''incremental learning'' f easible. (C) 1997 Elsevier Science Ltd.