ITA
ENG

A statistical, nonparametric methodology for document degradation model validation

Authors

Kanungo, T Haralick, RM Baird, HS Stuezle, W Madigan, D

Citation

T. Kanungo et al., A statistical, nonparametric methodology for document degradation model validation, IEEE PATT A, 22(11), 2000, pp. 1209-1223

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

ISSN journal

01628828 → ACNP

Volume

Issue

Year of publication

2000

Pages

1209 - 1223

Database

ISI

SICI code

0162-8828(200011)22:11<1209:ASNMFD>2.0.ZU;2-T

Abstract

Printing, photocopying, and scanning processes degrade the image quality of a document. Statistical models of these degradation processes are crucial for document image understanding research. Models allow us to predict syste m performance, conduct controlled experiments to study the breakdown points of the systems, create large multilingual data sets with groundtruth for t raining classifiers, design optimal noise removal algorithms, choose values for the free parameters of the algorithms, and so on. Although research in document understanding started many decades ago, only two document degrada tion models have been proposed thus far. Furthermore, no attempts have been made to statistically validate these models. In this paper, we present a s tatistical methodology that can be used to validate local degradation model s. This method is based on a nonparametric, two-sample permutation test. An other standard statistical device-the power function-is then used to choose between algorithm variables such as distance functions. Since the validati on and the power function procedures are independent of the model, they can be used to validate any other degradation model. A method for comparing an y two models is also described. It uses p-values associated with the estima ted models to select the model that is closer to the real world.