This document is for commonly found problems and their solutions when using ilab
. There is also a section that includes information on fine-tuning and troubleshooting your model to optimize the quality of its responses.
ilab data generate --endpoint-url
with llama-cpp fails with openai.InternalServerError: Service Unavailable
llama-cpp does not support batching, which is enabled by default with remote
endpoints. To resolve this error, disable batching using --batch-size=0
.
See this issue.
If you notice ilab data generate
running for several hours or more on a Mac M-series, you should first check out the available memory on your system (See Activity Monitor for more details). If there is < 8GM RAM available before serving a model, then check to see if you can free up some memory.
If this has not improved the running of the generation then check out this discussion. The suggestion here is to tweak the GPU limit of the macOS. By default it's around 60%-70% of your total RAM available, which is expressed as 0:
sudo sysctl iogpu.wired_limit_mb
iogpu.wired_limit_mb = 0
You can set it to any number, although it's advisable to leave 4-6GB RAM for the macOS.
For example, on a M1 with 16GB RAM, the ilab data generate
command with the limit bumped to
12GB was able to finish in less than an hour. Previously, it took several hours.
sudo sysctl iogpu.wired_limit_mb=12288
Once done, make sure to reset the limit back to 0, which is the default.
Note: This value will reset to the default after the machine reboots.
If you are looking to optimize the quality of the outputs generated by the model, there are a number of steps and parameters at various stages of the CLI workflow that you can consider leveraging. Some of these steps are discussed in the following sections.
It is important to note that improved response quality will come at a cost, in the form of increased compute requirements, increased time requirement, or both. The described steps will provide you with the best chance of improving the quality of your model's responses, but cannot guarantee an improvement in response quality.
Composing and contributing effective and impactful skills is an iterative process. The typical workflow looks something like this:
-
Compose skill examples.
-
Run the
ilab data generate
command. -
Examine the generated examples based on the supplied skill (found in the
generated
folder). -
If the generated examples are not satisfactory in quality, edit the skill examples.
-
Repeat the process until you are satisfied with the generated data.
-
Increase the number of examples in your skill YAML file. The more examples that the model has to go off of, the faster it will be able to generate synthetic data. The generated data will also be better if the input contains a wider range of examples.
-
Improve the quality of provided examples. Review the examples provided to it and see if they can be rephrased in a way that they align better with what you are hoping to see the model generate. This will improve chances of the model generating better quality synthetic data in large quantities.
The data generation step is executed via the ilab data generate
command, and is responsible for generating synthetic data. This forms the basis for what the model will end up learning.
NOTE The data produced from the generation step is only used within the user's local workflow to train the model and help the user fine tune their skill example. There is a separate process of data generation that is conducted in the backend once a user's skill is actually merged into the taxonomy repository.
-
Increase the number of instructions generated by passing the
--num-instructions
flag to theilab data generate
command as follows:ilab data generate --num-instructions 1000
. The--num-instructions
flag will generate 1000 points of synthetic data based on your provided examples. The greater the number of instructions generated, the better the model will be trained (within reasonable limits). -
Using a better model via
--model
. Larger models can lead to better data generation. This option requires users to be familiar with various existing models, and which specific models would suit their needs. This could mean either using a model with more nodes than the default InstructLabmerlinite-7b-lab
model, such as theMixtral-8x7B-Instruct-v0.1
model, or using an unquantized version of the InstructLabmerlinite-7b-lab
model. It can be used as follows:ilab serve --model-path models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf
andilab data generate --model models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf
-
Set the number of CPU cores that can be used to generate data via
--num-cpus
. This defaults to 10, but increasing this value could potentially lead to better generated data. It can be used as follows:ilab data generate --num-cpus 15
The training step is run with the ilab model train
command. This step trains the model on the synthetic data that was generated. The output of this step is a set of adapter files with the general format adapters-xxx.npz
, where xxx
is a number. These adapter files represent a snapshot of the model's trained state and are periodically written to disk.
-
Increase the number of training iterations via
--iters
flag. A larger number of iterations usually means a better trained model.NOTE: Diminishing returns might kick in around 300 or so iterations. Increasing the number of iterations comes at the cost of having to wait longer for the training to complete.
-
Pick an adapter file with the lowest validation loss. The training process generates and persists an adapter file periodically. The terminal output will tell you the validation loss that each adapter is associated with. The frequency of adapter file generation will be controlled by
--save-every
flag. For example,ilab model train --save-every 10
outputs an adapter file every10th
iteration.