Many neural net learning algorithms aim at finding ''simple'' nets to
explain training data. The expectation is that the ''simpler'' the net
works, the better the generalization on test data ( --> Occam's razor)
. Previous implementations, however use measures for ''simplicity'' th
at lack the power, universality and elegance of those based on Kolmogo
rov complexity and Solomonoff's algorithmic probability. Likewise, mos
t previous approaches (especially those of the the ''Bayesian'' kind)
suffer from the problem of choosing appropriate priors. This paper add
resses both issues, It first reviews some basic concepts of algorithmi
c complexity theory relevant to machine learning, and how the Solomono
ff-Levin distribution (or universal prior) deals with the prior proble
m. The universal prior leads to a probabilistic method for finding ''a
lgorithmically simple'' problem solutions with high generalization cap
ability. The method is based on Levin complexity (a time-bounded gener
alization of Kolmogorov complexity) and inspired by Levin's optimal un
iversal search algorithm. For a given problem, solution candidates are
computed by efficient ''self-sizing'' programs that influence their o
wn runtime and storage size. The probabilistic search algorithm finds
the ''good'' programs (the ones quickly computing algorithmically prob
able solutions fitting the training data). Simulations focus on the ta
sk of discovering ''algorithmically simple'' neural networks with low
Kolmogorov complexity and high generalization capability. It is demons
trated that the method, at least with certain toy problems where it is
computationally feasible, can lead to generalization results unmatcha
ble by previous neural network algorithms, Much remains to be done, ho
wever, to make large scale applications and ''incremental learning'' f
easible. (C) 1997 Elsevier Science Ltd.