ITA
ENG

Optimal information gathering on the Internet with time and cost constraints

Authors

Etzioni, O Hanks, S Jiang, T Madani, O

Citation

O. Etzioni et al., Optimal information gathering on the Internet with time and cost constraints, SIAM J COMP, 29(5), 2000, pp. 1596-1620

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SIAM JOURNAL ON COMPUTING

ISSN journal

00975397 → ACNP

Volume

Issue

Year of publication

2000

Pages

1596 - 1620

Database

ISI

SICI code

0097-5397(20000321)29:5<1596:OIGOTI>2.0.ZU;2-5

Abstract

The World Wide Web provides access to vast amounts of information, but cont ent providers are considering charging for the information and services the y supply. Thus the consumer may face the problem of balancing the benefit o f asking for information against the cost (in terms of both money and time) of acquiring it. We study information-gathering strategies that maximize t he expected value to the consumer. In our model there is a single informati on request, which has a known benefit to the consumer. To satisfy the reque st, queries can be sent simultaneously or in sequence to any of a finite se t of independent information sources. For each source we know the monetary cost of making the query, the amount of time it will take, and the probabil ity that the source will be able to provide the requested information. A po licy specifies which sources to contact at which times, and the expected va lue of the policy can be defined as some function of the likelihood that th e policy will yield an answer, the expected benefit, and the monetary cost and time delay associated with executing the policy. The problem is to nd a n expected-value-maximizing policy. We explore four variants of the objective function V: (i) V consists only o f the benefit term subject to threshold constraints on both total cost and total elapsed time, (ii) V is linear in the expected total cost of the poli cy subject to the constraint that the total elapsed time never exceeds some deadline, (iii) V is linear in the expected total elapsed time subject to the constraint that the total cost never exceeds some threshold, and (iv) V is linear in the expected total monetary cost and the expected time delay of the policy. The problems of devising an optimal querying policy for all four variants and approximating an optimal querying policy for variants (ii i) and (iv) are shown to be NP-hard. For (i), and with a mild simplifying a ssumption for (iii), we give a fully polynomial time approximation scheme. For (ii), we consider batched querying policies, and design an O(n(2)) time approximation algorithm with ratio 1/2 and a polynomial time approximation scheme for optimal single-batch policies, and an O(kn(2)) time approximati on algorithm with ratio 1/5 for optimal k-batch policies.