The m_trie
library is a general-purpose implementation of the prefix
tree data structure for the C99 language. The data structure stores
key/value pairs, where key is an array of bytes and value is an
arbitrary pointer to the associated data.
The following example inserts all command-line arguments into the trie and lets the user search for their presence:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <m_trie.h>
int
main(int argc, char* argv[])
{
m_trie tr;
char inp[256];
int i;
int ret;
m_trie_init(&tr, m_trie_hash_alphabet, 0);
for (i = 1; i < argc; i++)
m_trie_insert(&tr, (uint8_t*)argv[i], (uint32_t)strlen(argv[i]), NULL);
while (1) {
scanf("%s", inp);
if (strcmp(inp, "!") == 0)
break;
ret = m_trie_search(&tr, (uint8_t*)inp, (uint32_t)strlen(inp), NULL);
printf("%s\n", ret == M_TRIE_OK ? "yes" : "no");
}
return EXIT_SUCCESS;
}
Compile with:
cc -Wall -Wextra -Werror -std=c99 example.c -lmtrie
Please note that this example lacks error-checking for the sake of
brevity. All production code should check for return values of all
m_trie
functions.
More examples can be found in the examples/
folder.
Before calling any other function, the m_trie
object needs to be
initialised via the m_trie_init
function. This function does not
allocate the m_trie
object itself, which is left to the user of the
library.
The initialisation function expects three arguments: a pointer to a
m_trie
instance, pointer to a hash function (see below) and a flags
field that is an ORed combination of the following constants:
Inserting an identical key twice will result in the previously associated data to be overwritten. Such behaviour is prevented by default.
Trigger the garbage-collecting procedure after each removal request - no that this can have a significant impact on the performance of the data structure.
Call the free(3)
function on all value data pointers once the associated
key's removal is requested.
To free all resources held by the data structure, call the m_trie_free
function.
Prototypes:
int m_trie_init(m_trie* tr, int16_t (*hash)(uint8_t), uint8_t flags);
int m_trie_free(m_trie* tr);
In order to insert a new key/value pair into the prefix tree, the
m_trie_insert
function can be used. Once key/value pairs are inserted
into the data structure, it is possible to query the m_trie
for presence
of certain key with the m_trie_search
function. The stored value is
returned via the last argument of the function, while the return value of
the function is used to report potential errors.
Prototypes:
int m_trie_insert(m_trie* tr, uint8_t* key, uint32_t len, void* val);
int m_trie_search(m_trie* tr, uint8_t* key, uint32_t len, void** val);
The library offers two functions to remove keys (an their associated
values): m_trie_remove
, a direct counterpart to the m_trie_insert
function. It accepts a key and it's length and marks the corresponding
data node for future removal. It is important to note that the
m_trie_remove
function does not perform deallocation of memory and
simply marks node as if it did not contain any data. This decision was
taken due to performance reasons, where cascading freeing of memory could
pose significant memory delays and therefore uncontrollable jitter in
certain scenarios.
The last argument of the m_trie_remove
function is it's mode: either 0
for normal or 1
for prefix. If the prefix mode is selected, the function
removes all keys that start with the specified key. The key itself might
not indicate a previously inserted key, e.g. it is possible to call the
m_trie_remove
function in the prefix mode with key "ca"
and it would
delete keys such as "california"
and "carpool"
, even though "ca"
itself was not in the trie.
The second removal function, m_trie_remove_all
, simply marks all already
inserted keys to be non-existent, so that an immediate call to the
m_trie_search
function with any argument would not succeed. It is still
possible to use the m_trie
instance afterwards, without needing to call
m_trie_init
. All internal structures remain allocated, which can serve
as a performance feature in certain scenarios.
Prototypes:
int m_trie_remove(m_trie* tr, uint8_t* key, uint32_t len, uint8_t pfix);
int m_trie_remove_all(m_trie* tr);
As mentioned above, none of the removal functions actually deallocate
resources, in order to have a more predictable run-time. The
garbage-collection can be performed by the m_trie_trim
function that
traverses the data structure and frees the unused memory blocks. The
performance aspect of this procedure depends on how often this function is
called and may be significant.
Prototype:
int m_trie_trim(m_trie* tr);
The m_trie
data structure needs a user-supplied hash function to
navigate the levels of the tree. Each m_trie
hash function takes a
char
value as input and produces a short
value. In case that the byte
on the input is not valid, a correct m_trie
hash function must return
-1
. The function must uniquely map all valid byte inputs onto the
interval 0 .. (n-1)
, where n
is the number of valid input bytes. Each
m_trie
hash function therefore must be a minimal perfect hash
function.
The library identifies a list hash functions that are ubiquitous
and general enough to be included in the library, such as the identity
function m_trie_hash_identity
or a hash function that supports only the
lower-case letters of English alphabet m_trie_hash_lower_alphabet
.
Prototypes:
int16_t m_trie_hash_identity(uint8_t key);
int16_t m_trie_hash_alphabet(uint8_t key);
int16_t m_trie_hash_digits(uint8_t key);
int16_t m_trie_hash_base64(uint8_t key);
int16_t m_trie_hash_alphanumeric(uint8_t key);
int16_t m_trie_hash_lower_alphabet(uint8_t key);
int16_t m_trie_hash_upper_alphabet(uint8_t key);
Users of the library are welcomed to create custom hash functions that
obey the rules described above. An example of a hash function that accepts
only characters '0'
and '1'
:
int16_t
hash_01(uint8_t c)
{
if (c == 48) return 0;
if (c == 49) return 1;
return -1;
}
The table below makes an assumption that the hashing function operates in both constant time and constant space.
Function | Time | Space |
---|---|---|
m_trie_init |
O(1) |
O(1) |
m_trie_free |
O(n) |
O(k) |
m_trie_insert |
O(k) |
O(k) |
m_trie_search |
O(k) |
O(1) |
m_trie_remove |
O(k) |
O(1) |
m_trie_remove (pfix) |
O(n) |
O(k) |
m_trie_remove_all |
O(n) |
O(k) |
m_trie_trim |
O(n) |
O(k) |
m_trie_hash_* |
O(1) |
O(1) |
Where:
k
is the length of the longest keyn
is the overall number of allocated trie nodes
Each function that is part of the public API of the library has its own UNIX manual page. All pages belong to the section 3 of the manual catalogue and are automatically installed with the library.
The implementation is verified by a set of tests that each performs a list
of operations - insertion, removal and search of keys - combined in ways
that covers aim to cover all edge cases. The testing framework consists of
two files: test.c
that implements the data structure operations and
test.sh
that serves as orchestration and verification of test results.
The process can be invoked by triggering the test
target of the
library's Makefile
.
The library should compile under all C99 environments and standards-compliant compilers. This means that the library is expected to compile an work under Linux, FreeBSD, OpenBSD, NetBSD, macOS, Windows, Haiku, Plan9 and more. Please report any problems regarding a particular platform to the author.
The library is known to compile with Clang, GCC, and MSVC compilers.
The m_trie
library has zero dependencies and requires only a
C99-compatible compiler.
In order to compile, test and install the library run the following 4 steps (some of which may require superuser privileges):
$ make
$ make test
$ make install
$ make clean
The m_trie
library is licensed under the terms of the 2-clause BSD
license. For more information please consult the LICENSE
file. In case you need a different license, feel free to contact the
author.
Daniel Lovasko ([email protected])