Local Offline `GitHub Copilot` alternative

Technology Stack

Continue.dev Visual Studio Code extension for chat and autocomplete
Ollama to run the model on GPU / CPU (tested with Nvidia RTX A5000 GPU with 24 GB VRAM, no integrated graphics card), and provide a network API to serve the Continue.dev client.
4-bit quantized (Q4_0) 11.7 gigabyte version of Codestral 22.2 billion parameters by the French🥐 company Mistral AI, that can run fully on a GPU with 24GB of VRAM without any help from the CPU.
Tested on Windows 10 x86_64 Build 1903 operating system

Client Setup (end users)

You have to install VS Code that is either this version VSCodeUserSetup-x64-1.90.2.exe or newer on any client computer on the internal network that will use the AI services. You can download from https://update.code.visualstudio.com/1.90.2/win32-x64-user/stable which is a version of VS Code that supports the provided version of the Continue.
Disable VS Code auto update and disable auto update or extensions (you don't want your setup to suddenly break)

Any client will need the Continue VS Code extension for your operating system + CPU architecture. Use the file continue-win32-x64-0.8.46.vsix which you can download from: https://github.com/continuedev/continue/releases/tag/v0.8.46-vscode (scroll all the way down to Assets. The .vsix file can then be installed globally as an extension in VS Code as if you installed the extension from the internet (Extensions tab -> ... -> install from vsix).
Drag the newly added extension tab to the right like it asked you
In the welcome page: Welcome to Continue choose Local models and click Continue. Ignore the Failed to connect error that then keeps popping up
Click Continue again
Click the Configure gear icon ⚙ in the bottom right of the Continue extension window. That will open a config.json for you to edit.
Change it to look like this (replace 127.0.0.1 IP with the actual server's IP)

{
  "models": [
    {
      "title": "Codestral",
      "provider": "ollama",
      "model": "codestral:22b",
      "apiBase": "http://127.0.0.1:11434/",
      "systemMessage": "Do exactly as the user says / means",
      "contextLength": 32000
    }
  ],
  "tabAutocompleteOptions": {
    "disable": true
  },
  "allowAnonymousTelemetry": false
}

If you're using La Plateforme because you don't have enough VRAM (or because of license issues), this is the correct json configuration (replace with your real API key):

{
  "models": [
    {
      "title": "Codestral",
      "model": "codestral-latest",
      "contextLength": 32000,
      "apiKey": "tVXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
      "provider": "mistral"
    }
  ],
  "tabAutocompleteOptions": {
    "disable": true
  },
  "allowAnonymousTelemetry": false
}

Save the json file, and then reload VS Code window. Now you should be able to chat with Continue in its extension window (as long as the server is running on the internal network).

Client Usage

Open VS Code in a specific project, and press F1 key on your keyboard. That should open VS Code's command palette. Start typing: Continue: to see the options of what the AI can do for you while you're programming your code.
Select any length of text / code.
Run VS Code command (F1 -> type) Continue: Add Highlighted Code to Context, then type your prompt to ask the AI.
Example chat with the chat interface:

Ingest huge prompts, up to 32,000 tokens (400+ line readme within seconds).
Do some code completion by pressing CTRL+i at any moment while writing code. It'll add code where your cursor currently is. It'll ask you for instructions. The simplest instructions you can give it is: complete here.

Multi-turn conversations- Codestral keeps the context alive better than any LLM I know- it managed to win in the game of 20 questions!

Select files as context

Generate working UML syntax. Ask it to generate PlantUML (planttext.com) syntax

Server Setup

I installed OllamaSetup.exe for Windows on an online Windows 11 computer after I downloaded it from https://github.com/ollama/ollama/releases/tag/v0.3.13

I ran the command: ollama pull codestral:22b to download https://ollama.com/library/codestral which is 12 GB in size (it's quantized). It downloads into C:/Users/YOUR_USERNAME/.ollama folder

I then followed this guide https://ollama/blog/continue-code-assistant which essentially says that the command to run the server is: ollama run codestral:22b.

That however will only run on localhost, so instead, just run this batch file Run Ollama.

When Continue requests to use the model, it'll automatically be loaded.

Of course, you have to install ollama on the offline computer, and copy the pulled model from the .ollama folder in the online computer to C:/Users/YOUR_USERNAME/.ollama folder in the offline computer (and create it if it doesn't exist).

Before deleting all of the existing files in the .ollama folder on your offline computer, stop Ollama from running:

You should be able to process long context lengths within 10 seconds (like, an entire readme of a large project). If you can't: shut down ollama from the system tray, close all apps (or restart PC), and launch it again. You need to make sure that you launch it when you have enough VRAM available so that the entire model gets loaded into the GPU.

If some of the model gets into the CPU, that hurts prompt ingestion speed 40x (not sure if this is true, it might sometimes just get stuck for now reason until restarting Ollama).

Disable ollama.exe from running on startup (in Windows Task Manager startup items tab), so that we can have our batch file run on user login instead.

Known Issues

Only 1 prompt at a single time (more an issue of insufficient hardware)- That's what https://github.com/BigBIueWhale/ollama_load_balancer is for.
Sometimes gets loaded automatically into CPU when the GPU memory is too full
When chatting, the cancel doesn't always cancel (it maybe actually never). That means that if the LLM wants to give you a long response, you'll have to wait a while until you can ask again (or you can kill Ollama and start it again).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
CppKeepAwake.exe		CppKeepAwake.exe
README.md		README.md
create_realistic_tree.jpg		create_realistic_tree.jpg
ctrl_i_autocomplete.jpg		ctrl_i_autocomplete.jpg
disable_auto_update_vscode.png		disable_auto_update_vscode.png
p5_js_tree.png		p5_js_tree.png
quit_ollama.jpg		quit_ollama.jpg
run_ollama.bat		run_ollama.bat
run_on_startup.jpg		run_on_startup.jpg
select_files_as_context.jpg		select_files_as_context.jpg
very_good_multi_turn_conversations_1.jpg		very_good_multi_turn_conversations_1.jpg
very_good_multi_turn_conversations_2.jpg		very_good_multi_turn_conversations_2.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Offline `GitHub Copilot` alternative

Technology Stack

Client Setup (end users)

Client Usage

Server Setup

Known Issues

About

Languages

BigBIueWhale/assistant_coder

Folders and files

Latest commit

History

Repository files navigation

Local Offline GitHub Copilot alternative

Technology Stack

Client Setup (end users)

Client Usage

Server Setup

Known Issues

About

Resources

Stars

Watchers

Forks

Languages

Local Offline `GitHub Copilot` alternative