TheTom

Follow

Tom Turney TheTom

Follow

Working on LLM inference systems, KV cache compression, and kernel-level optimizations (TurboQuant).

513 followers · 2 following

Achievements

Achievements

Organizations

Pinned Loading

turboquant_plus turboquant_plus Public

Python 6.8k 904
longctx longctx Public

Open long-context inference stack: retrieval + open weights, no closed parts. pip install longctx.

Python 5
llama-cpp-turboquant llama-cpp-turboquant Public

Forked from ggml-org/llama.cpp

LLM inference in C/C++

C++ 1.5k 252
elm327_obd_for_mac elm327_obd_for_mac Public

I'm crazy and trying to make a ForScan OBD reader work on my mac.

Rust 7 2
vllm-swift vllm-swift Public

vLLM Metal plugin powered by mlx-swift — high-performance LLM inference on Apple Silicon

Python 256 17
google/usbinfo google/usbinfo Public archive

Python 40 21