FAST STRING SEARCHING IN A CHARACTER LATTICE

Citation
S. Senda et al., FAST STRING SEARCHING IN A CHARACTER LATTICE, IEICE transactions on information and systems, E77D(7), 1994, pp. 846-851
Citations number
NO
Categorie Soggetti
Computer Science Information Systems
ISSN journal
09168532
Volume
E77D
Issue
7
Year of publication
1994
Pages
846 - 851
Database
ISI
SICI code
0916-8532(1994)E77D:7<846:FSSIAC>2.0.ZU;2-V
Abstract
This paper presents an algorithm for string searching in a character l attice. A character lattice, which is obtained through a character rec ognition process, is a general and flexible data structure that repres ents many hypothesized strings in a document image. In this paper, the authors propose a simple and efficient algorithm; it consists of a si ngle loop of some set-operations and scans the character lattice only once. The authors also describe two actual implementations of the algo rithm; one uses Bit-Arrays and the other a Trie. Owing to its bit para llelism, the Bit-Array approach is able to search for a single pattern faster than the Trie approach, and is easily extended to complex matc hings such as an approximate one. It is suited for document retrieval systems that need to search for a keyword as fast as possible. A hashe d compact version of the character lattice is also useful to increase the speed of the search for a single pattern. In contrast, the Trie ap proach is able to search for a large number of patterns simultaneously , and is suited for document understanding systems that need to extrac t words from the character lattice. The experimental results have show n that both approaches achieve high performance.