Optimal information gathering on the Internet with time and cost constraints

Citation
O. Etzioni et al., Optimal information gathering on the Internet with time and cost constraints, SIAM J COMP, 29(5), 2000, pp. 1596-1620
Citations number
15
Categorie Soggetti
Computer Science & Engineering
Journal title
SIAM JOURNAL ON COMPUTING
ISSN journal
00975397 → ACNP
Volume
29
Issue
5
Year of publication
2000
Pages
1596 - 1620
Database
ISI
SICI code
0097-5397(20000321)29:5<1596:OIGOTI>2.0.ZU;2-5
Abstract
The World Wide Web provides access to vast amounts of information, but cont ent providers are considering charging for the information and services the y supply. Thus the consumer may face the problem of balancing the benefit o f asking for information against the cost (in terms of both money and time) of acquiring it. We study information-gathering strategies that maximize t he expected value to the consumer. In our model there is a single informati on request, which has a known benefit to the consumer. To satisfy the reque st, queries can be sent simultaneously or in sequence to any of a finite se t of independent information sources. For each source we know the monetary cost of making the query, the amount of time it will take, and the probabil ity that the source will be able to provide the requested information. A po licy specifies which sources to contact at which times, and the expected va lue of the policy can be defined as some function of the likelihood that th e policy will yield an answer, the expected benefit, and the monetary cost and time delay associated with executing the policy. The problem is to nd a n expected-value-maximizing policy. We explore four variants of the objective function V: (i) V consists only o f the benefit term subject to threshold constraints on both total cost and total elapsed time, (ii) V is linear in the expected total cost of the poli cy subject to the constraint that the total elapsed time never exceeds some deadline, (iii) V is linear in the expected total elapsed time subject to the constraint that the total cost never exceeds some threshold, and (iv) V is linear in the expected total monetary cost and the expected time delay of the policy. The problems of devising an optimal querying policy for all four variants and approximating an optimal querying policy for variants (ii i) and (iv) are shown to be NP-hard. For (i), and with a mild simplifying a ssumption for (iii), we give a fully polynomial time approximation scheme. For (ii), we consider batched querying policies, and design an O(n(2)) time approximation algorithm with ratio 1/2 and a polynomial time approximation scheme for optimal single-batch policies, and an O(kn(2)) time approximati on algorithm with ratio 1/5 for optimal k-batch policies.