ITA
ENG

Binary searching with nonuniform costs and its application to text retrieval

Authors

Navarro, G Barbosa, EF Baeza-Yates, R Cunto, W Ziviani, N

Citation

G. Navarro et al., Binary searching with nonuniform costs and its application to text retrieval, ALGORITHMIC, 27(2), 2000, pp. 145-169

Citations number

Categorie Soggetti

Engineering Mathematics

Journal title

ALGORITHMICA

ISSN journal

01784617 → ACNP

Volume

Issue

Year of publication

2000

Pages

145 - 169

Database

ISI

SICI code

0178-4617(200006)27:2<145:BSWNCA>2.0.ZU;2-8

Abstract

We study the problem of minimizing the expected cost of binary searching fo r data where the access cost is not fixed and depends on the last accessed element, such as data stored in magnetic or optical disk. We present an opt imal algorithm for this problem that finds the optimal search strategy in O (n(3)) time, which is the same time complexity of the simpler classical pro blem of fixed costs. Next, we present two practical linear expected time al gorithms, under the assumption that the access cost of an element is indepe ndent of its physical position. Both practical algorithms are online, that is, they find the next element to access as the search proceeds. The first one is an approximate algorithm which minimizes the access cost disregardin g the goodness of the problem partitioning. The second one is a heuristic a lgorithm, whose quality depends on its ability to estimate the final search cost, and therefore it can be tuned by recording statistics of previous ru ns. We present an application for our algorithms related to text retrieval. Whe n a text collection is large it demands specialized indexing techniques for efficient access. One important type of index is the suffix array, where d ata access is provided through an indirect binary search on the text stored in magnetic disk or optical disk. Under this cost model we prove that the optimal algorithm cannot perform better than Omega (1/log n) times the stan dard binary search. We also prove that the approximate strategy cannot, on average, perform worse than 39% over the optimal one. We confirm the analyt ical results with simulations, showing improvements between 34% (optimal) a nd 60% (online) over standard binary search for both magnetic and optical d isks.