ITA
ENG

SUMMARY OF IMAGED DOCUMENTS WITHOUT OCR

Authors

CHEN FR BLOOMBERG DS

Citation

Fr. Chen et Ds. Bloomberg, SUMMARY OF IMAGED DOCUMENTS WITHOUT OCR, Computer vision and image understanding, 70(3), 1998, pp. 307-320

Citations number

Categorie Soggetti

Computer Science Software Graphycs Programming","Computer Science Software Graphycs Programming

Journal title

Computer vision and image understanding → ACNP

ISSN journal

10773142

Volume

Issue

Year of publication

1998

Pages

307 - 320

Database

ISI

SICI code

1077-3142(1998)70:3<307:SOIDWO>2.0.ZU;2-Q

Abstract

A system is presented for creating a summary indicating the contents o f an imaged document. The summary is composed from selected regions ex tracted from the imaged document. The regions may include sentences, k ey phrases, headings, and figures. The extracts are identified without the use of optical character recognition. The imaged document is firs t processed to identify the word-bounding boxes, the reading order of words, and the location of sentence and paragraph boundaries in the te xt. The word-bounding boxes are grouped into equivalence classes to mi mic the terms in a text document. Equivalence classes representing con tent words are identified, and key phrases are identified from the set of content words. Summary sentences are selected using a statisticall y based classifier applied to a set of discrete sentence features. Eva luation of sentence selection against a set of abstracts created by a professional abstracting company is given. (C) 1998 Academic Press.