Current general-purpose memory allocators do not provide sufficient speed o
r flexibility for modern high-performance applications. Highly-tuned genera
l purpose allocators have per-operation costs around one hundred cycles, wh
ile the cost of an operation in a custom memory allocator can be just a han
dful of cycles. To achieve high performance, programmers often write custom
memory allocators from scratch - a difficult and error-prone process.
In this paper, we present a flexible and efficient infrastructure for build
ing memory allocators that is based on C++ templates and inheritance. This
novel approach allows programmers to build custom and general-purpose alloc
ators as "heap layers" that can be composed without incurring any additiona
l runtime overhead or additional programming cost. We show that this infras
tructure simplifies allocator construction and results in allocators that e
ither match or improve the performance of heavily-tuned allocators written
in C, including the Kingsley allocator and the GNU obstack library. We furt
her show this infrastructure can be used to rapidly build a general-purpose
allocator that has performance comparable to the Lea allocator, one of the
best uniprocessor allocators available. We thus demonstrate a clean, easy-
to-use allocator interface that seamlessly combines the power and efficienc
y of any number of general and custom allocators within a single applicatio
n.