ITA
ENG

READING HANDWRITTEN PHRASES ON US CENSUS FORMS

Authors

MADHVANATH S GOVINDARAJU V SRIHARI SN

Citation

S. Madhvanath et al., READING HANDWRITTEN PHRASES ON US CENSUS FORMS, International journal of imaging systems and technology, 7(4), 1996, pp. 312-319

Citations number

Categorie Soggetti

Optics,"Engineering, Eletrical & Electronic

Journal title

International journal of imaging systems and technology → ACNP

ISSN journal

08999457

Volume

Issue

Year of publication

1996

Pages

312 - 319

Database

ISI

SICI code

0899-9457(1996)7:4<312:RHPOUC>2.0.ZU;2-4

Abstract

Commercial form-reading systems for extraction of data from forms do n ot meet acceptable accuracy requirements on forms filled out by hand. Several important form-processing applications involve the automated r eading of handwritten responses. U.S. Census forms are a case in point . A database of form images containing actual responses received by th e U.S. Census Bureau was made available by National Institute of Stand ards and Technology (NIST) in December 1993. A number of factors combi ne to make the task of reading these forms a challenging one. The qual ity of form images is often poor, and the handwritten responses are ve ry loosely constrained in terms of writing style, format of response, and choice of text. The sizes of the lexicons provided are large (10,0 00-50,000 entries) and yet the coverage is incomplete (60%-70%). In th is article we discuss our approach to automate the task of reading the census forms. The subtasks of field extraction and phrase recognition are described and multiclassifier control strategies for phrase recog nition are presented. The error rate of the system when no rejects are allowed is 59%, with a lower bound of 40% being imposed by the incomp lete coverage of the lexicon. The article concludes with a discussion of experimental results and directions for future research. (C) 1995 J ohn Wiley & Sons, Inc.