Emerging distributed query-processing systems support flexible execution st
rategies in which each query can be run using a combination of data shippin
g and query shipping. As in any distributed environment, these systems can
obtain tremendous performance and availability benefits by employing dynami
c data caching. When flexible execution and dynamic caching are combined, h
owever, a circular dependency arises: Caching occurs as a by-product of que
ry operator placement, but query operator placement decisions are based on
(cached) data location. The practical impact of this dependency is that que
ry optimization decisions that appear valid on a per-query basis can actual
ly cause suboptimal performance for all queries in the long run.
To address this problem, we developed Cache Investment-a novel approach for
integrating query optimization and data placement that looks beyond the pe
rformance of a single query. Cache Investment sometimes intentionally gener
ates a "suboptimal" plan for a particular query in the interest of effectin
g a better data placement for subsequent queries. Cache Investment can be i
ntegrated into a distributed database system without changing the internals
of the query optimizer. In this paper, we propose Cache Investment mechani
sms and policies and analyze their performance. The analysis uses results f
rom both an implementation on the SHORE storage manager and a detailed simu
lation model. Our results show that Cache Investment can significantly impr
ove the overall performance of a system and demonstrate the trade-offs amon
g various alternative policies.