This notebook allows you to run the Qwen 2.5 Coder (32B) language model on Google Colab using Ollama. The setup provides a public endpoint through ngrok, allowing you to interact with the model from anywhere.
-
Google Account
- You need a Google account to use Google Colab
- Visit Google Colab
-
ngrok Account
- Free account required for public endpoint access
- Sign up at ngrok.com
- Get your authtoken from dashboard.ngrok.com/get-started/your-authtoken
- Open the notebook in Google Colab
- Select
Runtime
->Change runtime type
- Set "Hardware accelerator" to
GPU
- Click
Save
- Click the 🔑 (key) icon in the left sidebar to open "Secrets"
- Click "Add new secret"
- Set Name as:
authtoken
- Set Value as: your ngrok authtoken
- Click "Add"
- Run all cells in order
- Wait for the model to download (approximately 20GB)
- The notebook will display your public URL when ready
- GPU: T4 or better (provided by Google Colab)
- Storage: At least 25GB free space (for model download)
- RAM: 12GB or more (provided by Google Colab)
- Model: qwen2.5-coder-32b
- Size: ~20GB
- Type: Code-specialized language model
- Provider: Qwen (Alibaba)
Once the notebook is running, you can access the model through:
- The provided ngrok URL in the notebook output
- Any Ollama-compatible client by pointing it to the ngrok URL
- Direct HTTP requests to the API endpoints
- GET
/api/tags
- List available models - POST
/api/generate
- Generate text - POST
/api/chat
- Chat with the model
curl -X POST https://your-ngrok-url/api/generate \
-H 'Content-Type: application/json' \
-d '{"model": "qwen2.5-coder:32b", "prompt": "Write a hello world program in Python"}'
Ex. CodeGPT
-
Session Duration
- Google Colab sessions have a limited runtime (usually 12 hours)
- Save any important outputs before the session ends
- The ngrok URL will change each time you restart the notebook
-
Resource Usage
- Monitor GPU memory usage in Colab
- The model requires significant GPU memory
- Close other GPU-intensive notebooks
-
Security
- The ngrok URL is publicly accessible
- Anyone with the URL can access your model
- Consider implementing additional authentication if needed
-
GPU Not Available
- Ensure you've selected GPU runtime in Colab
- Check if you've hit Colab's GPU usage limits
- Try disconnecting and reconnecting to runtime
-
Model Download Issues
- Check your internet connection
- Ensure enough free storage space
- Try restarting the runtime
-
ngrok Connection Issues
- Verify your authtoken is correct
- Check if you've hit ngrok's connection limits
- Ensure no firewall restrictions
Feel free to open issues or submit pull requests for improvements.
This project is provided as-is under the MIT License. The Qwen model has its own license terms that should be consulted separately.
This is not an official Qwen or Ollama product. Use at your own risk and responsibility.