This Script takes a GGUF model and extracts it's Weight then converts it to NumPy and PyTorch Tensor.
First of all, you'll need a GGUF Model.
You can get one at HuggingFace or use other programs as LM Studio to get a Model.
Then, make sure you have Python installed and clone the .py Script of the repository.
You can setup the process in two ways:
- Go to the llama.cpp repos and either build or download the compiled.
To build it, you'll need CMAKE, and in case you use a NVIDIA GPU and you want to use it, make sure you have the CUDA Toolkit installed too. Follow the build guide that fits the best for your system.
Install the necessary dependencies with the following command in a CMD:
cd llama.cpp
pip install -r requirements.txt
Or if it fails or you're using a newer version of Python:
cd llama.cpp
py -m pip install -r requirements.txt
Then, place the Script and the GGUF Model in llama.cpp\gguf-py\gguf
so it takes the gguf library reference necessary for the process.
- Just install these Python libraries in a CMD: torch, numpy, sentencepiece, pyyaml and gguf
pip install torch numpy sentencepiece pyyaml gguf
Or if it fails or you're using a newer version of Python:
py -m pip install torch numpy sentencepiece pyyaml gguf
Make sure the Script and the GGUF Model is in the same folder.
Run the Script gguftopytorch.py
in the terminal using the py
or python
command in the CMD or in an IDE like Visual Studio Code with the Python Extension to save GGUF Weigths to PyTorch Tensor.
If the Script runs succesfully, it will generate two files in the same folder:
- llama-weight.npy: NumPy file with the GGUF LlaMA Model's Weight stored.
- PyTorchTensor.pt: PyTorch Tensor file with the NumPy converted into a Tensor.
- Programming Language: Python (3.13.2)
- Libraries:
- torch (2.6.0)
- numpy (2.2.0)
- sentencepiece (0.2.0)
- pyyaml (6.0.2)
- gguf (0.14.0)
- Other:
- CMAKE (3.31.5)
- CUDA Toolkit (12.8)
- VS Code Python Extension
- Recommended IDE: VS Code