S. Chandran et al., STRUCTURE RECOGNITION AND INFORMATION EXTRACTION FROM TABULAR DOCUMENTS, International journal of imaging systems and technology, 7(4), 1996, pp. 289-303
We present a system for the extraction of the structural information o
f a table from its image. Following the initial binarization and deske
wing operations, the image is scanned to extract all horizontal and ve
rtical lines that may be present. The table's dimensions are estimated
based on these lines. Unlike other systems, the procedure described h
ere does not depend on the sole existence of lines to mark the item bl
ocks. White streams are recognized in both the horizontal and vertical
directions as substitutes for any missing demarcation lines. A struct
ure interpretation procedure uses the extracted demarcation informatio
n to identify each of the item blocks in the table. Subsequently, the
interrelations of these item blocks are used to recognize the structur
e of the tabulated data. The interpretation can be done for one-dimens
ional as well as two-dimensional tables. interpretation of the tabular
document involves character recognition, which in turn depends on the
structure of the table. The above procedure to extract the structural
information of the tabular document can be used to extract useful inf
ormation from different types of tabular drawings. In this article, we
focus our attention on interpreting telephone company central office
drawings. These drawings contain additional information in the form of
crossed-out entries and repeated entries, which must be detected and
recognized to interpret the document completely. Hence, after extracti
ng the basic structure of the drawing, the additional information is e
xtracted and cell block location is obtained in order to develop a dat
a base representing the tabular document. The telephone company drawin
gs are very large in size, resulting in images as large as 15,000 x 10
,000 pixels. Thus, designing efficient and fast algorithms is an impor
tant criterion in this research. (C) 1996 John Wiley & Sons, Inc.