Thanks to lu-zero that give me a good idea to start the Rust backend for LocalAI project with burn.
Source: Author
The default acceleration of backend of the LocalAI project can support:
- CPU
- OpenBLAS
- GPU
- CUDA/CuBLAS - Nvidia
- Hipplas - AMD
- ClBLAS - AMD/Intel
- Metal - Apple Silicon
So, the requirement of in here is that we need to have a backend that can support CPU and GPU, and as more as support the acceleration of CPU or GPU.
According to the Supported Platforms of burn. We can se burn-ndarray backend does not support GPU. And the mainstream of AI framework is based on GPU. So, it is not a good idea as a default backend.
And Burn torch backend is based on tch-rs crate, which offers a Rust interface to the PyTorch C++ API. And it supports:
- CPU
- CUDA
- MPS
- Vulkan
As you can see that all the GPU acceleration was already support by LocalAI project.
Burn WGPU Backend is using the wgpu, and it supports
- Vulkan
- Metal
- DX11/12
- OpenGL
- WebGPU
So, here we can see that the WGPU backend can support all the GPU acceleration that LocalAI project need except the CPU accelertion. However, I beleive WASM could have a feature. And more and more LLMs need GPU to get a better performance. Although we have Lora and QLora technologies to decrease the computing resources of using by LLMs. But, we still need to have a GPU acceleration for LLMs. And here are some issues about the WGPU backend:
- Gadersd/stable-diffusion-xl-burn#2
- Gadersd/stable-diffusion-burn#7
- Gadersd/stable-diffusion-burn#5
- Gadersd/whisper-burn#19
According to above, it would be a good choice that we choose the Burn torch backend as the default backend of LocalAI project.
Although we do not have a benchmark here to test the performance of the WGPU backend. But, we can implement it as first. And deal with in the future development cycles if we have a better backend. There are also some exmaples of using burn, please check the current project on Github.
- Xmind address
- Build of LocalAI