Skip to content

Latest commit

 

History

History
157 lines (114 loc) · 4.8 KB

README.md

File metadata and controls

157 lines (114 loc) · 4.8 KB

μgc

License

μgc is a single-header garbage collector library. It is designed to be embedded in a programming language's runtime.

Features

  • Fully incremental using tri-color marking.
  • No external dependencies.
  • No memory allocation.
  • Overhead of 2 pointers per object.
  • Thoroughly tested: unit test and property-based test

Usage

Building

Put ugc.h into your project.

Write in one C file:

#define UGC_IMPLEMENTATION
#include "ugc.h"

Integrating with your runtime

All heap allocated objects must include ugc_header_t, usually as the first member:

struct my_heap_obj_s
{
	ugc_header_t header;

	// Other fields
	type_info_t type;
	void* external_ref;
	// ...
};

Initialize a garbage collector with:

ugc_t gc;

ugc_init(&gc, scan_fn, free_fn);

// Each gc instance has a userdata field.
gc.userdata = language_runtime;

ugc_init requires two callback functions: scan_fn and free_fn.

scan_fn is a function that will be called to help μgc trace an object's reference and the root set. It should behave as follow:

static void
scan_gc_obj(ugc_t* gc, ugc_header_t* header)
{
	if(header == NULL) // Scan the root set
	{
		// ugc_visit needs to be called on each pointer in the stack and global
		// environment

		struct my_language_runtime_s* runtime = gc->userdata;

		// Scan the stack
		for(unsigned int i = 0; i < runtime->stack_len; ++i)
		{
			if(runtime->stack[i]) { ugc_visit(gc, runtime->stack[i]); }
		}

		// If the language is similar to Lua where the global environment is a
		// first-class object, call ugc_visit on the object instead of its fields
		ugc_visit(gc, runtime->global);
	}
	else // Scan the given object
	{
		// ugc_visit needs to be called on each external reference of this
		// object.

		struct my_heap_obj_s* obj = (struct my_heap_obj_s*)header;
		if(obj->external_ref) { ugc_visit(gc, obj->external_ref); }
	}
}

The second parameter of ugc_visit must not be NULL and must point to a ugc_header_t.

free_fn will be called when μgc has determined that a language's object is garbage. It should release an object's resources:

static void
free_gc_obj(ugc_t* gc, ugc_header_t* header)
{
	struct my_language_runtime_s* runtime = gc->userdata;
	runtime->free(header);
}

When a new object is allocated, it needs to be registered with μgc using:

ugc_register(gc, new_object);

Whenever an object receives a reference to another (i.e: src.field = dst), μgc must be informed:

ugc_write_barrier(gc, direction, src, dst);

There are two types of write barriers: "forward" and "backward".

LuaJIT wiki states the following about "backward" barrier:

This is moving the barrier "back", because the object has to be reprocessed later on. This is beneficial for container objects, because they usually receive several stores in succession. This avoids a barrier for the next objects that are stored into it (which are likely white, too).

And "forward" barrier:

This moves the barrier "forward", because it implicitly drives the GC forward. This works best for objects that only receive isolated stores.

In the above language example, stores to the stack/local variables do not require a write barrier but stores to global variables do.

Controlling garbage collection

μgc does not start collection automatically because there are many factors (e.g: heap size, number of objects, time limit...) that need to be considered. It is best left to the language implementer to decide. Thus, it provides two functions to control the garbage collector:

ugc_step(gc) performs a single atomic step (e.g: mark/free one object, scan the root set). One typically calls it several times per allocation or regularly after a trigger. "Push/pop GC pause" function usually found in some language's API can be implemented by simply maintaining a counter and not calling ugc_step if it is non-zero. The current state of the GC is stored in ugc_t::state. One can use that to give different speeds to different phases.

ugc_collect(gc) finishes the current collection cycle. This means:

  • If the GC has alread started, it will return once the current cycle ends.
  • If the GC is idle (ugc_t::state == UGC_IDLE), it will start a cycle and finish it.

Therefore, if the GC is already in the UGC_SWEEP phase, any new garbage will be left to the next cycle. One needs to pay attention to this when calling gc_collect in case of emergency (e.g: malloc returns NULL). The first call to gc_collect may reclaim some but not all memory. It is possible for malloc to return NULL again. gc_collect should be called a second time. The runtime should only panic if malloc still fails.