Commercial form-reading systems for extraction of data from forms do n
ot meet acceptable accuracy requirements on forms filled out by hand.
Several important form-processing applications involve the automated r
eading of handwritten responses. U.S. Census forms are a case in point
. A database of form images containing actual responses received by th
e U.S. Census Bureau was made available by National Institute of Stand
ards and Technology (NIST) in December 1993. A number of factors combi
ne to make the task of reading these forms a challenging one. The qual
ity of form images is often poor, and the handwritten responses are ve
ry loosely constrained in terms of writing style, format of response,
and choice of text. The sizes of the lexicons provided are large (10,0
00-50,000 entries) and yet the coverage is incomplete (60%-70%). In th
is article we discuss our approach to automate the task of reading the
census forms. The subtasks of field extraction and phrase recognition
are described and multiclassifier control strategies for phrase recog
nition are presented. The error rate of the system when no rejects are
allowed is 59%, with a lower bound of 40% being imposed by the incomp
lete coverage of the lexicon. The article concludes with a discussion
of experimental results and directions for future research. (C) 1995 J
ohn Wiley & Sons, Inc.