Fast algorithms for k-word proximity search

Citation
K. Sadakane et H. Imai, Fast algorithms for k-word proximity search, IEICE T FUN, E84A(9), 2001, pp. 2311-2318
Citations number
12
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES
ISSN journal
09168508 → ACNP
Volume
E84A
Issue
9
Year of publication
2001
Pages
2311 - 2318
Database
ISI
SICI code
0916-8508(200109)E84A:9<2311:FAFKPS>2.0.ZU;2-B
Abstract
When we search from a huge amount of documents, we often specify several ke ywords and use conjunctive queries to narrow the result of the search. Thou gh the searched documents contain all keywords, positions of the keywords a re usually not considered. As a result, the search result contains some mea ningless documents. It is therefore effective to rank documents according t o proximity of keywords in the documents. This ranking is regarded as a kin d of text data mining. In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conque r approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the algorithms on a large collection of html files and verify its effectiveness.