aho-corasick

Aho-Corasick parallel string search, using interleaved arrays.

ACISM is an implementation of Aho-Corasick parallel string search, using an Interleaved State-transition Matrix. It combines the fastest possible Aho-Corasick implementation, with the smallest possible data structure (!).

FEATURES

Fast. No hashing, no tree traversal; just a straight look-up equivalent to matrix[state, input-byte] per input character.
Tiny. On average, the whole data structure (mostly the array) takes about 2-3 bytes per input pattern byte. The original set of pattern strings can be reverse-generated from the machine.
Shareable. The state machine contains no pointers, so it can be compiled once, then memory-mapped by many processes.
Searches byte vectors, not null-terminated strings. Suitable for searching machine code as much as searching text.
DOS-proof. Well, that's an attribute of Aho-Corasick, so no real points for that.
Stream-ready. The state can be saved between calls to search data.

DOCUMENTATION

The GoogleDocs description is at http://goo.gl/lE6zG I originally called it "psearch", but found that name was overused by other authors.

LICENSE

LGPL v3

GETTING STARTED

Download the source, type "gmake". "gmake install" exports lib/libacism.a, include/acism.h and bin/acism_x. "acism_x.c" is a good example of calling acism_create and acism_scan/acism_more.

(If you're interested in the GNUmakefile and rules.mk, check my blog posts on non-recursive make, at mischasan.wordpress.com.)

HISTORY

The interleaved-array approach was tried and discarded in the late 70's, because the compile time was O(n^2). acism_create beats the problem with a "hint" array that tracks the restart points for searches. That, plus discarding the original idea of how to get maximal density, resulted in the tiny-fast win-win.

ACKNOWLEDGEMENTS

I'd like to thank Mike Shannon, who wanted to see a machine built to make best use of L1/L2 cache. The change to do that doubled performance on hardware with a much larger cache than the matrix. Go figure.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.gitignore		.gitignore
Aho-Corasick-Interleaved_State_Matrix.pdf		Aho-Corasick-Interleaved_State_Matrix.pdf
GNUmakefile		GNUmakefile
LICENSE		LICENSE
README.md		README.md
_acism.h		_acism.h
acism.c		acism.c
acism.h		acism.h
acism.sln		acism.sln
acism.vcxproj		acism.vcxproj
acism_create.c		acism_create.c
acism_dump.c		acism_dump.c
acism_file.c		acism_file.c
acism_mmap_x.c		acism_mmap_x.c
acism_strings.c		acism_strings.c
acism_t		acism_t
acism_x.c		acism_x.c
acism_x.vcxproj		acism_x.vcxproj
acism_x.vcxproj.user		acism_x.vcxproj.user
deas		deas
msutil.c		msutil.c
msutil.h		msutil.h
rules.mk		rules.mk
tap.c		tap.c
tap.h		tap.h
words.gz		words.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

aho-corasick

FEATURES

DOCUMENTATION

LICENSE

GETTING STARTED

HISTORY

ACKNOWLEDGEMENTS

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

mischasan/aho-corasick

Folders and files

Latest commit

History

Repository files navigation

aho-corasick

FEATURES

DOCUMENTATION

LICENSE

GETTING STARTED

HISTORY

ACKNOWLEDGEMENTS

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages