ITA
ENG

Applying MDL to learn best model granularity

Authors

Gao, Q Li, M Vitanyi, P

Citation

Q. Gao et al., Applying MDL to learn best model granularity, ARTIF INTEL, 121(1-2), 2000, pp. 1-29

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

ARTIFICIAL INTELLIGENCE

ISSN journal

00043702 → ACNP

Volume

121

Issue

1-2

Year of publication

2000

Pages

1 - 29

Database

ISI

SICI code

0004-3702(200008)121:1-2<1:AMTLBM>2.0.ZU;2-O

Abstract

The Minimum Description Length (MDL) principle is solidly based on a provab ly ideal method of inference using Kolmogorov complexity. We test how the t heory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends cri tically on the granularity, for example the choice of precision of the para meters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distin guished. This precision is often determined ad hoc. In MDL the best model i s the one that most compresses a two-part code of the data set: this embodi es "Occam's Razor". In two quite different experimental settings the theore tical value determined using MDL coincides with the best value found experi mentally. In the first experiment the task is to recognize isolated handwri tten characters in one subject's handwriting, irrespective of size and orie ntation. Based on a new modification of elastic matching, using multiple pr ototypes per character, the optimal prediction rate is predicted for the le arned parameter (length of sampling interval) considered most likely by MDL , which is shown to coincide with the best value found experimentally. In t he second experiment the task is to model a robot arm with two degrees of f reedom using a three layer feed-forward neural network where we need to det ermine the number of nodes in the hidden layer giving best modeling perform ance. The optimal model (the one that extrapolizes best on unseen examples) is predicted for the number of nodes in the hidden layer considered most l ikely by MDL, which again is found to coincide with the best value found ex perimentally. (C) 2000 Elsevier Science B.V. All rights reserved.