ITA
ENG

FAST STRING SEARCHING IN A CHARACTER LATTICE

Authors

SENDA S MINOH M IKEDA K

Citation

S. Senda et al., FAST STRING SEARCHING IN A CHARACTER LATTICE, IEICE transactions on information and systems, E77D(7), 1994, pp. 846-851

Citations number

Categorie Soggetti

Computer Science Information Systems

Journal title

IEICE transactions on information and systems → ACNP

ISSN journal

09168532

Volume

E77D

Issue

Year of publication

1994

Pages

846 - 851

Database

ISI

SICI code

0916-8532(1994)E77D:7<846:FSSIAC>2.0.ZU;2-V

Abstract

This paper presents an algorithm for string searching in a character l attice. A character lattice, which is obtained through a character rec ognition process, is a general and flexible data structure that repres ents many hypothesized strings in a document image. In this paper, the authors propose a simple and efficient algorithm; it consists of a si ngle loop of some set-operations and scans the character lattice only once. The authors also describe two actual implementations of the algo rithm; one uses Bit-Arrays and the other a Trie. Owing to its bit para llelism, the Bit-Array approach is able to search for a single pattern faster than the Trie approach, and is easily extended to complex matc hings such as an approximate one. It is suited for document retrieval systems that need to search for a keyword as fast as possible. A hashe d compact version of the character lattice is also useful to increase the speed of the search for a single pattern. In contrast, the Trie ap proach is able to search for a large number of patterns simultaneously , and is suited for document understanding systems that need to extrac t words from the character lattice. The experimental results have show n that both approaches achieve high performance.