From 6451cbdde6c4d2fda700b0bcb8a8b5ef1af278c4 Mon Sep 17 00:00:00 2001 From: Isaac Ong Date: Mon, 1 Jul 2024 10:38:25 -0700 Subject: [PATCH] Move demo up --- blog/2024-07-01-routellm.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/blog/2024-07-01-routellm.md b/blog/2024-07-01-routellm.md index d8e7d94a..656a2867 100644 --- a/blog/2024-07-01-routellm.md +++ b/blog/2024-07-01-routellm.md @@ -15,6 +15,10 @@ LLM routing offers a solution to this, where each query is first processed by a To tackle this, we present **RouteLLM**, a principled framework for LLM routing based on preference data. We formalize the problem of LLM routing and explore augmentation techniques to improve router performance. We trained four different routers using public data from Chatbot Arena and demonstrate that they can significantly reduce costs without compromising quality, with **cost reductions of over 85% on MT Bench, 45% on MMLU, and 35% on GSM8K** as compared to using only GPT-4, while still achieving 95% of GPT-4’s performance. We also publicly release all our code and datasets, including a new [open-source framework](https://github.com/lm-sys/RouteLLM) for serving and evaluating LLM routers. +## Demo + +We have built a temporary [demo](https://816388d8af31950a69.gradio.live) where you can experiment with our matrix factorization and causal LLM routers by seeing which model your messages are routed to. Both routers have been calibrated so that approximately 50% of calls are routed to GPT-4. Please try it out! + ## Routing Setup In our routing setup, we focus on the case where there are two models: a stronger, more expensive model, and a weaker but cheaper model. Given this setup, our objective is to minimize costs while achieving high quality by routing between both models. @@ -90,10 +94,6 @@ Based on this research, we have created an open-source framework for serving and We are excited to see what you build on top of this! Please let us know if you face any issues or have any suggestions. For the full details, please refer to our [arXiv](https://arxiv.org/abs/2406.18665) paper. -## Demo - -We have built a temporary [demo](https://0c83f754b05f4a2208.gradio.live) where you can experiment with our augmented matrix factorization and causal LLM routers by seeing which model your messages are routed to. Both routers have been calibrated so that approximately 20% of calls are routed to GPT-4. Please try them out! - ## Acknowledgements We are grateful to Tyler Griggs for his valuable feedback on this post.