READING HANDWRITTEN PHRASES ON US CENSUS FORMS

Citation
S. Madhvanath et al., READING HANDWRITTEN PHRASES ON US CENSUS FORMS, International journal of imaging systems and technology, 7(4), 1996, pp. 312-319
Citations number
7
Categorie Soggetti
Optics,"Engineering, Eletrical & Electronic
ISSN journal
08999457
Volume
7
Issue
4
Year of publication
1996
Pages
312 - 319
Database
ISI
SICI code
0899-9457(1996)7:4<312:RHPOUC>2.0.ZU;2-4
Abstract
Commercial form-reading systems for extraction of data from forms do n ot meet acceptable accuracy requirements on forms filled out by hand. Several important form-processing applications involve the automated r eading of handwritten responses. U.S. Census forms are a case in point . A database of form images containing actual responses received by th e U.S. Census Bureau was made available by National Institute of Stand ards and Technology (NIST) in December 1993. A number of factors combi ne to make the task of reading these forms a challenging one. The qual ity of form images is often poor, and the handwritten responses are ve ry loosely constrained in terms of writing style, format of response, and choice of text. The sizes of the lexicons provided are large (10,0 00-50,000 entries) and yet the coverage is incomplete (60%-70%). In th is article we discuss our approach to automate the task of reading the census forms. The subtasks of field extraction and phrase recognition are described and multiclassifier control strategies for phrase recog nition are presented. The error rate of the system when no rejects are allowed is 59%, with a lower bound of 40% being imposed by the incomp lete coverage of the lexicon. The article concludes with a discussion of experimental results and directions for future research. (C) 1995 J ohn Wiley & Sons, Inc.