All Web caches must try to keep cached pages up to date with the maste
r copies of those pages, to avoid returning stale pages to users. In t
raditional distributed systems terminology, the problem of keeping cac
hed pages up to date is called coherence. We discuss the coherence pro
blem for Web caches, and argue that coherence techniques used for dist
ributed file system caches may not be satisfactory for Web caches. We
survey techniques used by popular Web caches for maintaining coherence
, including the popular ''expiration mechanism'' which probably origin
ated in CERN's proxy http server. We discuss a number of problems with
the existing expiration mechanism, and present several extensions to
it which solve these problems, reduce user wait times and decrease the
staleness of returned Web pages. We also discuss pre-fetching and rep
lication, more speculative techniques for keeping Web caches up to dat
e.