A public domain optical character recognition (OCR) system has been de
veloped by the National institute of Standards and Technology (NIST).
This standard reference form-based handprint recognition system is des
igned to provide a baseline of performance on an open application. The
system's source code, training data, performance assessment tools, an
d type of forms processed are all publicly available. The system is mo
dular, allowing for system component testing and comparisons, and it c
an be used to validate training and testing sets in an end-to-end appl
ication. The system's source code is written in C and will run on virt
ually any UNIX-based computer. The presented functional components of
the system are divided into three levels of processing: (1) form-level
processing in eludes the tasks of form registration and form removal;
(2) field-level processing includes the tasks of field isolation, lin
e trajectory reconstruction, and field segmentation; and (3) character
-level processing includes character normalization, feature extraction
, character classification, and dictionary-based postprocessing. The s
ystem contains a number of significant contributions to OCR technology
, in eluding an optimized probabilistic neural network (PNN) classifie
r that operates a factor of 20 times faster than traditional software
implementations of the algorithm. Provided in the system are a host of
data structures and low-level utilities for computing spatial histo g
rams, least-squares fitting, spatial zooming, connected components, Ka
rhunen Loeve feature extraction, optimized PNN classification, and dyn
amic string alignment. Any portion of this standard reference OCR syst
em can be used in commercial products without restrictions, (C) 1997 S
PIE and IS&T. [S1017-9909(97)00502-3].