An API for chatting with multiple LLM models.
LLMEngine
currently has support of
- Ollama
- GroqAPI
Using, Ollama
you can run multiple LLM models on your devices. Some models supported using Ollama
are
- Llama3
- phi3
- gemma2b
Currently, mixtral8x7b
is supported using GroqAPI
.
To run this engine
, you need to run this command
python main.py
There are API endpoints
You can chat with the supported
LLMs using this API.
http://localhost:8000/api/v1/prompt
In the Body
you can give prompt
and model_type
For example
{
"prompt": "Fastest planet in the world",
"model_type": "mixtral-8x7b-32768"
}
Yes, Retrieval Augmented Generation (RAG)
is also supported in LLMEngine
.
Hit this endpoint
http://localhost:8000/api/v1/rag
In the Body
pass, model_type
, query
and file
- Setup Support for more models
- Shift
llama3
support to GroqAPI as it requires a lot of compute. - Add
SelfCorrectiveRAG
support in the API. - Add support for different types of files
- Add WebsiteLoader support