Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests : add integration test for lora adapters #8957

Merged
merged 2 commits into from
Aug 18, 2024

Conversation

ltoniazzi
Copy link
Contributor

@ltoniazzi ltoniazzi commented Aug 9, 2024

Context

Partially solving Issue #8708, by adding a test that:

  1. Downloads reduced-sized llms for several architectures in .safetensors (where each architecture is overfitted to some text ) along with a corresponding adapter (overfitted to another text).
  2. Converts base and lora models to .gguf.
  3. Runs inference asserting the overfitted texts are returned as completions with llama-cli (--lora)

TODOs

  • Figure out why lora adapter for Gemma-2 is not working (mix of last layer=first layer and setting temp=0).
  • Add llama-3 (working correctly)
  • Add more Phi-3 (working correctly).
  • make models smaller (size 32?, now 64-> ~50/70MB). (Getting input size issues in gguf with llama at 32)
  • Add assertions.
  • Figure out how sh files works in the actions. The script needs llama-cli, python env and git lfs In a separate PR, follow .github/workflows/server.yml
  • Currently models downloaded from this personal hf repo, so need to move the models to ggml-org. (Models generated with code here). Created repo lora-tests

Current output

./tests/test-lora-conversion-inference.sh --verbose:

image

PR checks

@github-actions github-actions bot added the testing Everything test related label Aug 9, 2024
@mofosyne mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix bugfix fixes an issue or bug labels Aug 10, 2024
@ltoniazzi ltoniazzi changed the title [WIP] Add test for --lora flag [WIP] Add integration test for lora adapters Aug 10, 2024
@ltoniazzi ltoniazzi force-pushed the testing/add-test-for-hot-lora branch from 9b4f2ff to 35d04e7 Compare August 10, 2024 13:15
@ltoniazzi
Copy link
Contributor Author

ltoniazzi commented Aug 10, 2024

@ngxson I've done it for Llama-3 and Gemma-2.

  • ✅ Llama-3 is working fine (see image in this PR description)!

  • ❌ Gemma-2 I'm still struggling to make the adapter work. Will have to look into it a bit more.

In the meantime, I was planning to do Phi-3 now. Which other architectures are popular in llama.cpp to add a couple more?


echo "Running llama-cli with exported lora for $model_name with size $size_matrix..."
OUTPUT_LORA_MERGED=$(llama-cli -m $MODELS_REPO/$model_name/size=$size_matrix/base/Base-F32-lora-merged.gguf \
-p "I see a little silhouetto" -n 50 --seed 42)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing --temp 0 for consistent result.

@ltoniazzi
Copy link
Contributor Author

@ngxson Phi-3 is also working fine, and I added assertions in the .sh test file. I think we can finish off this PR by:

  • By setting it up to run correctly in the actions (not sure if it needs more setup)
  • Moving models to the ggml hf repo.

For Gemma-2 I'll try figure out where the bug is this week (as it could very well be in my code to reduce the size of the model) and add it as a todo in the issue #8708.

@ltoniazzi ltoniazzi force-pushed the testing/add-test-for-hot-lora branch from 9aa01b7 to 56a8992 Compare August 12, 2024 10:05
@ltoniazzi ltoniazzi changed the title [WIP] Add integration test for lora adapters Add integration test for lora adapters Aug 13, 2024
@ltoniazzi ltoniazzi force-pushed the testing/add-test-for-hot-lora branch 4 times, most recently from 5ebdefc to c19e4fa Compare August 16, 2024 07:09
@ltoniazzi ltoniazzi marked this pull request as ready for review August 16, 2024 07:09
@ltoniazzi
Copy link
Contributor Author

@ngxson @mofosyne I fixed and added the Gemma2 model to the tests (the bug was in my code from the fact that Gemma embedding weights equal the lm_head weights , which I think can cause some issues when converting adapters, I'll open an issue about it this weekend)

Anyway, to complete this PR I still need to:

  • See if new test script runs in the git-actions (not sure if it needs more setup). How to trigger an actions run?
  • Move models to the ggml hf repo. Who should I contact to do this?

@ltoniazzi ltoniazzi force-pushed the testing/add-test-for-hot-lora branch from c19e4fa to 32ed624 Compare August 16, 2024 07:23
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved the PR so that the CI can run when you push new commits.

I'll add you to https://huggingface.co/ggml-org - let me know your HF username

@ltoniazzi
Copy link
Contributor Author

I'll add you to https://huggingface.co/ggml-org - let me know your HF username

@ggerganov Thanks! My HF Username is ltoniazzi

@ltoniazzi
Copy link
Contributor Author

@ggerganov Should I create a new model repo in ggml-org called reduced-models-for-testing?

I'm not sure what is a good naming, as also I want to make sure people do not think it's related to reducing size by quantization (basically these "reduced" models keep the original architecture the same apart from reducing the hidden_size parameter).

@ggerganov
Copy link
Owner

Should I create a new model repo in ggml-org called reduced-models-for-testing?

Yes, make a new repo. Can call it lora-tests to avoid confusion

@ngxson
Copy link
Collaborator

ngxson commented Aug 16, 2024

See if new test script runs in the git-actions (not sure if it needs more setup). How to trigger an actions run?

I'm not sure if you will need special permissions to create a new github workflow. The code for CI can be based on .github/workflows/server.yml

But I think I can do this later on. I'll need to see what's the condition to trigger the workflow, because I think it's not useful to run the workflow on all commits.

Move models to the ggml hf repo.

I see you now have access to ggml-org. Ideally I think each model should be in a dedicated repo (since you're also uploading safetensors files, not just GGUF). The model can be prefixed, for example test-llama or test-gemma. The small models can also be used in other tests, not just lora test.

For adapters, you can also upload the PEFT safetensors files, one repo per adapter, with prefix lora-test-*

@ltoniazzi ltoniazzi force-pushed the testing/add-test-for-hot-lora branch from 32ed624 to 163f43f Compare August 16, 2024 09:57
@ltoniazzi
Copy link
Contributor Author

ltoniazzi commented Aug 16, 2024

@ngxson Thanks, but before refactoring the repo can we make the current test run successfully?

I added the test to the Make/CMake files so it runs locally with make test, but it fails in the build because it's running the python scripts to convert base/adapters and it seems the build environment does not have python dependencies.
What is the best approach to fix this?

@ltoniazzi
Copy link
Contributor Author

(since you're also uploading safetensors files, not just GGUF).

@ngxson I am uploading only safetensors, as the test includes conversion to gguf of both base and adapter.

@ngxson
Copy link
Collaborator

ngxson commented Aug 16, 2024

make test is only used for ctest, so I prefer not to add the test there. For lora test (it's e2e test), it's better to have its own workflow.

@ltoniazzi
Copy link
Contributor Author

@ngxson If you point me to a base workflow that I can mimic I can have a go at it. Or do you prefer to merge the new test script alone and create the new workflow separately?

@ngxson
Copy link
Collaborator

ngxson commented Aug 16, 2024

I think it's ok to merge this PR as-is, please tell me when the models are uploaded to huggingface so I can run the test locally.

If you still want to try out the workflow:

The code for CI can be based on .github/workflows/server.yml

Server workflow already have python and cmake, you can take it as the template.

@ltoniazzi ltoniazzi force-pushed the testing/add-test-for-hot-lora branch from 163f43f to 9a8f050 Compare August 16, 2024 10:49
@ltoniazzi
Copy link
Contributor Author

ltoniazzi commented Aug 16, 2024

please tell me when the models are uploaded to huggingface

@ngxson They are uploaded, you can run the script locally

./tests/test-lora-conversion-inference.sh --verbose

Ok let's merge this, then I'll have a go at the workflow in a separate PR and we can discuss the model's folder structure in the meantime.

@ltoniazzi ltoniazzi mentioned this pull request Aug 16, 2024
8 tasks
@ltoniazzi
Copy link
Contributor Author

@ngxson After you merge this I prepared the workflow PR here #9058.

@ngxson ngxson changed the title Add integration test for lora adapters tests: add integration test for lora adapters Aug 18, 2024
@ngxson ngxson changed the title tests: add integration test for lora adapters tests : add integration test for lora adapters Aug 18, 2024
@ngxson
Copy link
Collaborator

ngxson commented Aug 18, 2024

I made some minor changes in this commit: d6f7b8f

I ran it locally and confirmed that it works. So it's good to merge now. Thanks for taking your time on this.

@ngxson ngxson merged commit 2339a0b into ggerganov:master Aug 18, 2024
8 checks passed
@ltoniazzi ltoniazzi deleted the testing/add-test-for-hot-lora branch August 19, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants