Applying MDL to learn best model granularity

Citation
Q. Gao et al., Applying MDL to learn best model granularity, ARTIF INTEL, 121(1-2), 2000, pp. 1-29
Citations number
29
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
ARTIFICIAL INTELLIGENCE
ISSN journal
00043702 → ACNP
Volume
121
Issue
1-2
Year of publication
2000
Pages
1 - 29
Database
ISI
SICI code
0004-3702(200008)121:1-2<1:AMTLBM>2.0.ZU;2-O
Abstract
The Minimum Description Length (MDL) principle is solidly based on a provab ly ideal method of inference using Kolmogorov complexity. We test how the t heory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends cri tically on the granularity, for example the choice of precision of the para meters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distin guished. This precision is often determined ad hoc. In MDL the best model i s the one that most compresses a two-part code of the data set: this embodi es "Occam's Razor". In two quite different experimental settings the theore tical value determined using MDL coincides with the best value found experi mentally. In the first experiment the task is to recognize isolated handwri tten characters in one subject's handwriting, irrespective of size and orie ntation. Based on a new modification of elastic matching, using multiple pr ototypes per character, the optimal prediction rate is predicted for the le arned parameter (length of sampling interval) considered most likely by MDL , which is shown to coincide with the best value found experimentally. In t he second experiment the task is to model a robot arm with two degrees of f reedom using a three layer feed-forward neural network where we need to det ermine the number of nodes in the hidden layer giving best modeling perform ance. The optimal model (the one that extrapolizes best on unseen examples) is predicted for the number of nodes in the hidden layer considered most l ikely by MDL, which again is found to coincide with the best value found ex perimentally. (C) 2000 Elsevier Science B.V. All rights reserved.