Skip to content
Martin Trenkmann edited this page Nov 16, 2023 · 5 revisions

Hi there! 👋

My name is Martin and I am the author of NGRAMS.

I am a backend developer and like to build software that deals with large amounts of data. I do things in C++ because it's fast — at least at runtime, not necessarily at development time. I also like web programming to some extent thanks to TypeScript. When I should name my most important software development principle it would be Keep It Simple. I received a master's degree in computer science in 2012 from the Bauhaus-Universität Weimar (Germany).

Email me at [email protected]


NGRAMS is my third implementation of a search engine of this kind.

2019 - todayngrams.dev
Dataset: Google Books Ngram Dataset v3
Size: 23 TB compressed, ~230 TB uncompressed
Backend: C++20 for core app, uWebSockets for REST API server
This thing has been released mid April 2023.

2015 - todayphrasefinder.io
Dataset: Google Books Ngram Dataset v2
Size: 7 TB compressed, ~70 TB uncompressed
Backend: C++14 for core app, Boost Beast for REST API server
This thing will be discontinued by the end of 2023.

2007 - 2013netspeak.org
Dataset: Web 1T 5-gram Version 1
Size: 25 GB compressed, ~75 GB uncompressed
Backend: C++03 (later C++11) for core app, JNI, Java Servlet for REST API server
This thing started off as my bachelor thesis.