A fork of oobabooga/text-generation-webui with additional features and improvements.
- Web UI for running large language models
- Support for multiple model backends (llama.cpp, ExLlamaV2, Transformers, etc.)
- OpenAI-compatible API
- Extensions system
- Portable releases for Windows, Linux, and macOS
Download the latest release from the Releases page for your platform:
textgen-windows-cuda.zip— Windows with NVIDIA GPUtextgen-linux-cuda.zip— Linux with NVIDIA GPUtextgen-cpu.zip— CPU-only (all platforms)
Extract and run start.bat (Windows) or start.sh (Linux/macOS).
Prerequisites:
- Python 3.11+
- CUDA 12.1+ (for GPU support)
git clone https://github.com/your-org/textgen
cd textgen
pip install -r requirements.txt
python server.py# Start with default settings
python server.py
# Start with specific model
python server.py --model my-model
# Enable API and listen on all interfaces (my usual setup)
python server.py --api --listen --port 5001
# Enable API
python server.py --api
# Listen on all interfaces
python server.py --listen- Improved build pipeline with GitHub Actions
- Additional IK (ik_llama.cpp) backend support
- Various bug fixes and performance improvements
- API default port changed to 5001 to avoid conflict with other local services
See .github/workflows/ for build configurations.
# Install build dependencies
pip install -r requirements.txt
# Run the server
python server.py --verboseThe OpenAI-compatible API is available at http://localhost:5001/v1 when started with --api.
Note: This fork defaults to port 5001 instead of the upstream 5000, which tends to conflict with AirPlay on macOS and other local services.
See the API documentation for details.
Pull requests are welcome. Please use the provided PR template and follow the existing code style.
AGPL-3.0 — see LICENSE for details.