We consider cluster-based network servers in which a front-end directs
incoming requests to one of a number of back-ends. Specifically, we c
onsider content-based request distribution: the front-end uses the con
tent requested, in addition to information about the load on the back-
end nodes, to choose which back-end will handle this request. Content-
based request distribution can improve locality in the back-ends' main
memory caches, increase secondary storage scalability by partitioning
the server's database, and provide the ability to employ back-end nod
es that are specialized for certain types of requests. As a specific p
olicy for content-based request distribution, we introduce a simple, p
ractical strategy for locality-aware request distribution (LARD). With
LARD, the front-end distributes incoming requests in a manner that ac
hieves high locality in the back-ends' main memory caches as well as l
oad balancing. Locality is increased by dynamically subdividing the se
rver's working set over the back-ends, Trace-based simulation results
and measurements on a prototype implementation demonstrate substantial
performance improvements over state-of-the-art approaches that use on
ly load information to distribute requests. On workloads with working
sets that do not fit in a single server node's main memory cache, the
achieved throughput exceeds that of the state-of-the-art approach by a
factor of two to four. With content-based distribution, incoming requ
ests must be handed off to a back-end in a manner transparent to the c
lient, after the front-end has inspected the content of the request. T
o this end, we introduce an efficient TCP handoff protocol that can ha
nd off an established TCP connection in a client-transparent manner.