Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make load test more generic for other LM tasks #53

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

drewrip
Copy link
Contributor

@drewrip drewrip commented Jul 19, 2024

In order to better use llm-load-test to evaluate embedding tasks I had to chop the code up a bit. I don't think this even necessarily should or needs to be merged, but I wanted to open this for discussion and publicize this code as I work on it.

@drewrip
Copy link
Contributor Author

drewrip commented Aug 1, 2024

@dagrayvid Definitely no rush on this, but whenever you have a second could you take a look at this PR and let me know what you think in terms of where this could should/could go? It restructures the current llm-load-test model quite a bit to make the embedding stuff fit, so it might not fit with the vision for llm-load-test. Happy to keep this as a fork otherwise :)

RUN git switch $GIT_BRANCH
RUN pip3 install -r requirements.txt

CMD python3 load_test.py -c $LLM_LOAD_TEST_CONFIG -log $LLM_LOAD_TEST_LOG_LEVEL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here an ENV for the output files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure what you mean, but the path for the output files should be included in the llm-load-test config file

@dagrayvid
Copy link
Collaborator

@ccamacho @drewrip for now, I'm thinking that testing embedding models is out-of-scope for llm-load-test, and this should remain as a separate fork for now. One reason is that it requires making the output processing "pluggable", which is otherwise unnecessary so far. Another reason is that I think a tool made specifically for embedding tasks would probably have a different type of dataset.

Does that make sense? Open to more discussion on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants