In the past decade, advances in the speed of commodity CPUs have far out-pa
ced advances in memory latency. Main-memory access is therefore increasingl
y a performance bottleneck for many computer applications, including databa
se systems, in this article, we use a simple scan test to show the severe i
mpact of this bottleneck. The insights gained are translated into guideline
s for database architecture, in terms of both data structures and algorithm
s. We discuss how vertically fragmented data structures optimize cache perf
ormance on sequential data access. We then focus on equi-join, typically a
random-access operation, and introduce radix algorithms for partitioned has
h-join. The performance of these algorithms is quantified using a detailed
analytical model that incorporates memory access cost. Experiments that val
idate this model were performed on the Monet database system. We obtained e
xact statistics on events such as TLB misses and L1 and L2 cache misses by
using hardware performance counters found in modern CPUs. Using our cost mo
del, we show how the carefully tuned memory access pattern of our radix alg
orithms makes them perform well, which is confirmed by experimental results
.