I'm interested in AI, with a focus on inference and post-training of language models, vision-language models and generative diffusion models. Follow me on Twitter @sumo43_ for updates on whatever i may be working on.
- Demo: Object Detection Demo on X
- Description:
A fast paligemma inference engine running on the RTX 4090. I built an object detection demo using the siglip-224px model that runs in real time at 20fps.
- Description:
RobotArena is an ELO-based 🤖 Robot-Action Model benchmark that lets you test models directly in your browser. This project is a collaboration with SkunkworksAI, allowing users to explore and evaluate robot-action models in a browser.
Role: LLM Inference Engineer
Overview:
At Brium AI, I worked on accelerating inference for large language models across diverse GPU architectures. My role focused on optimizing the inference stack—from runtime systems to compilers—for long-context LLM applications. This work led to significant improvements in throughput and latency, particularly on AMD’s MI210 and MI300 GPUs.
Read more: Brium AI Blog Post
Role: ML Engineer
Overview:
At RunPod, I built an in-house inference engine that supports low-latency workloads with speculative decoding. I also collaborated closely with customers to deploy AI models effectively on the RunPod stack.
- Twitter: @sumo43_
- Email: [email protected]