A simple web interface for LLaSA using ExLlamaV2 with an OpenAI compatible FastAPI server.
Clone the repo:
git clone https://github.com/zuellni/llasa-webui
cd llasa-webuiCreate a conda/mamba/python env:
conda create -n llasa-webui python=3.12
conda activate llasa-webuiInstall dependencies, ignore any xcodec2 errors:
pip install -r requirements.txt
pip install xcodec2 --no-depsIf you want to use torch+cu126, keep in mind that you'll need to compile exllamav2 and (optionally) flash-attn, and for python=3.13 you may need to compile sentencepiece.
python server.py --model <path or repo id>You can use the HF models or EXL2 quants from here. Add --cache q4 --dtype bf16 for less VRAM usage.
