A Docker-based solution that downloads fine-tuned models from Hugging Face, converts them to GGUF format, and deploys them with Ollama as an API. Specifically configured for Estonian Grammar Error Correction (GEC) using the Llama-3.1-8B model.
- Download models from Hugging Face
- Conversion from safetensors to GGUF format
- Model quantization options
- Ollama API integration with model preloading
- CPU and GPU deployment profiles
The project is pre-configured for Estonian GEC model. Edit .env
if needed:
HF_MODEL_NAME=tartuNLP/Llama-3.1-8B-est-gec-july-2025
MODEL_NAME=gec
QUANTIZATION=full
For CPU deployment:
COMPOSE_PROFILES=cpu docker compose up --build
For GPU deployment:
COMPOSE_PROFILES=gpu docker compose up --build
For CPU deployment:
docker build -t gec-ollama-api-cpu \
--build-arg HF_MODEL_NAME=tartuNLP/Llama-3.1-8B-est-gec-july-2025 \
--build-arg MODEL_NAME=gec \
--build-arg QUANTIZATION=full \
-f Dockerfile.cpu \
.
docker run -d \
--name gec-ollama-api-cpu \
-p 11434:11434 \
gec-ollama-api-cpu
For GPU deployment:
docker build -t gec-ollama-api-gpu \
--build-arg HF_MODEL_NAME=tartuNLP/Llama-3.1-8B-est-gec-july-2025 \
--build-arg MODEL_NAME=gec \
--build-arg QUANTIZATION=full \
-f Dockerfile \
.
docker run -d \
--name gec-ollama-api-gpu \
--gpus all \
-p 11434:11434 \
gec-ollama-api-gpu
Once the container is running, test the API:
# Check available models
curl http://localhost:11434/api/tags
# Test GEC model with Estonian text
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "gec",
"prompt": "### Instruction:\nReply with a corrected version of the input essay in Estonian with all grammatical and spelling errors fixed. If there are no errors, reply with a copy of the original essay.\n\n### Input:\nMul on kaks koer ja üks kass\n\n### Response:\n",
"stream": false,
options: {
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50,
"max_tokens": 100
}
}'
Variable | Description | Default | Required |
---|---|---|---|
HF_MODEL_NAME |
Hugging Face model name | tartuNLP/Llama-3.1-8B-est-gec-july-2025 |
Yes |
HF_TOKEN |
Hugging Face token (for private models) | - | No |
MODEL_NAME |
Name for the converted model | gec |
No |
QUANTIZATION |
Quantization level | full |
No |
Choose the right balance between model size and quality:
full
,orig
- No quantization (keeps original precision - larger file, better quality)f16
,f32
- Specific float precisionq2_k
- Smallest size, lowest qualityq3_k_s
,q3_k_m
,q3_k_l
- Small size, low qualityq4_0
,q4_1
- Medium size, good qualityq4_k_s
,q4_k_m
- Balanced size and qualityq5_0
,q5_1
,q5_k_s
,q5_k_m
- Larger size, better qualityq6_k
- Large size, high qualityq8_0
- Large, very high quality
Note: This project defaults to full
precision for best GEC quality.
The service exposes the standard Ollama API on port 11434
:
-
GET /api/tags
- List available models -
POST /api/generate
- Text generation -
POST /api/show
- Show model info
The deployed model is specifically fine-tuned for Estonian Grammar Error Correction:
- Model:
gec
- Language: Estonian
- Task: Grammar Error Correction
- Input Format: Uses instruction-based prompting
gec-ollama-api/
├── .env # Configuration file
├── .gitignore # Git ignore rules
├── README.md # This file
├── docker-compose.yml # Docker Compose configuration
├── Dockerfile # GPU Docker build
├── Dockerfile.cpu # CPU Docker build
├── ollama/ # Ollama configuration
│ ├── Modelfile.template # Model template
│ ├── setup_model.sh # Model setup script
│ └── startup.sh # Container startup script
└── scripts/ # Build scripts
└── download_and_convert.sh # Model conversion script