ITA
ENG

Fast algorithms for k-word proximity search

Authors

Sadakane, K Imai, H

Citation

K. Sadakane et H. Imai, Fast algorithms for k-word proximity search, IEICE T FUN, E84A(9), 2001, pp. 2311-2318

Citations number

Categorie Soggetti

Eletrical & Eletronics Engineeing

Journal title

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES

ISSN journal

09168508 → ACNP

Volume

E84A

Issue

Year of publication

2001

Pages

2311 - 2318

Database

ISI

SICI code

0916-8508(200109)E84A:9<2311:FAFKPS>2.0.ZU;2-B

Abstract

When we search from a huge amount of documents, we often specify several ke ywords and use conjunctive queries to narrow the result of the search. Thou gh the searched documents contain all keywords, positions of the keywords a re usually not considered. As a result, the search result contains some mea ningless documents. It is therefore effective to rank documents according t o proximity of keywords in the documents. This ranking is regarded as a kin d of text data mining. In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conque r approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the algorithms on a large collection of html files and verify its effectiveness.