artem sumo43

Hi there 👋

I'm interested in AI, with a focus on inference and post-training of language models, vision-language models and generative diffusion models. Follow me on Twitter @sumo43_ for updates on whatever i may be working on.

🔭 Projects

loopVLM

Demo: Object Detection Demo on X
Description:
A fast paligemma inference engine running on the RTX 4090. I built an object detection demo using the siglip-224px model that runs in real time at 20fps.

RobotArena

Description:
RobotArena is an ELO-based 🤖 Robot-Action Model benchmark that lets you test models directly in your browser. This project is a collaboration with SkunkworksAI, allowing users to explore and evaluate robot-action models in a browser.

💼 Work Experience

Brium AI

Role: LLM Inference Engineer
Overview:
At Brium AI, I worked on accelerating inference for large language models across diverse GPU architectures. My role focused on optimizing the inference stack—from runtime systems to compilers—for long-context LLM applications. This work led to significant improvements in throughput and latency, particularly on AMD’s MI210 and MI300 GPUs.
Read more: Brium AI Blog Post

RunPod

Role: ML Engineer
Overview:
At RunPod, I built an in-house inference engine that supports low-latency workloads with speculative decoding. I also collaborated closely with customers to deploy AI models effectively on the RunPod stack.

📫 Get in touch

Twitter: @sumo43_
Email: [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

artem sumo43

Achievements

Achievements

Highlights

Organizations

Block or report sumo43