Skip to content
Joe Hellerstein edited this page Dec 8, 2015 · 1 revision

Memory Allocator

(excerpted from J. M. Hellerstein, M. Stonebraker and J. Hamilton, "Architecture of a Database System". To appear in Foundations and Trends in Databases 1(2).)

The textbook presentation of DBMS memory management tends to focus entirely on the buffer pool. In practice, database systems allocate significant amounts of memory for other tasks as well. The correct management of this memory is both a programming burden and a performance issue. Selinger-style query optimization can use a great deal of memory, for example, to build up state during dynamic programming. Query operators like hashjoins and sorts allocate significant memory at runtime. Memory allocation in commercial systems is made more efficient and easier to debug via the use of a ''context- based_ (or _region-based'') memory allocator.

A memory context is an in-memory data structure that maintains a list of regions of contiguous virtual memory, often called memory pools. Each region can have a small header that contains either a context label or a pointer to the context header structure.
The basic API for memory contexts includes calls to:

  • Create a context with a given name or type. The context type might advise the allocator how to efficiently handle memory allocation. For example, the contexts for the query optimizer grow via small increments, while contexts for hashjoins allocate their memory in a few large batches. Based on such knowledge, the allocator can choose to allocate bigger or smaller regions at a time.

  • Allocate a chunk of memory within a context. This allocation will return a pointer to memory (much like the traditional malloc() call). That memory may come from an existing region in the context. Or, if no such space exists in any region, the allocator will ask the operating system for a new region of memory, label it, and link it into the context

  • Delete a chunk of memory within a context. This may or may not cause the context to delete the corresponding region. Deletion from memory contexts is somewhat unusual. A more typical behavior is to delete an entire context.

  • Delete a context. This first frees all of the regions associated with the context, and then deletes the context header.

  • Reset a context. This retains the context, but returns it to the state of original creation, typically by deallocating all previously-allocated regions of memory.

Memory contexts provide important software engineering advantages. The most important is that they serve as a lower-level, programmer-controllable alternative to garbage collection. For example, the developers writing the optimizer can allocate memory in an optimizer context for a particular query, without worrying about how to free the memory later on. When the optimizer has picked the best plan, it can copy the plan into memory from a separate executor context for the query, and then simply delete the query’s optimizer context. This saves the trouble of writing code to carefully walk all the optimizer data structures and delete their components. It also avoids tricky memory leaks that can arise from bugs in such code. This feature is very useful for the naturally “phased” behavior of query execution, where control proceeds from parser to optimizer to executor, with a number of allocations in each context followed by a context deletion.

Note that memory contexts actually provide more control than most garbage collectors as developers can control both spatial and temporal locality of deallocation. The context mechanism itself provides the spatial control that allows the programmer to separate memory into logical units. Temporal control follows from programmers being allowed to issue context deletions when appropriate. By contrast, garbage collectors typically work on all of a program’s memory, and make their own decisions about when to run. This is one of the frustrations of attempting to write server-quality code in Java.

Memory contexts also provide performance advantages on platforms where the overhead for malloc() and free()is relatively high. In particular, memory contexts can use semantic knowledge (via the context type) of how memory will be allocated and deallocated, and may call malloc() and free() accordingly to minimize OS overheads. Some components of a database system (e.g. the parser and optimizer) allocate a large number of small objects, and then free them all at once via a context deletion. Calls to free()many small objects are rather expensive on most platforms. A memory allocator can instead call malloc() to allocate large regions, and apportion the resulting memory to its callers. The relative lack of memory deallocations means that the compaction logic that malloc() and free() use is not needed. And when the context is deleted, only a few free() calls are required to remove the large regions.

Clone this wiki locally