You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our unit test ensures that loading model and running inference successfully, but cannot indicate the output is reasonable instead of kinda garbage. Currently we need to run the examples when making some major features & fix, which is a bit annoying.
To address this issue, I think we could send the output to OpenAI chatgpt to check if it's reasonable. I will afford the payment of the tokens but will use github.triggering_actor to allow only developers who have write access to trigger the corresponding workflow.
The text was updated successfully, but these errors were encountered:
We could also hardcode the expected responses in the unit tests. For example in this test it generates two completions of "Question. what is a cat?\nAnswer:" and assets that they are the same. We could assert the exact response too.
Of course this would only work with temp=0 and a specific model (even a specific quantisation), but it might save a few OpenAI calls!
We could also hardcode the expected responses in the unit tests.
Yes, I also want to save the tokens where this approach works! I'll only consider using OpenAI API when necessary.
We can not run all the test in CI. we should verify all the test locally.
I tend to view things a bit differently. The workflows and unit test are responsible for reducing the risk when we merge the PRs. As long as the workflows pass, it should be equal to saying that terrible behaviors won't appear if we merge the PR.
However, due to the GPU backends, it's indeed hard for us to cover all the cases in the workflows. I can provide a machine with Nvidia GPU and Linux OS to run the workflows, but no idea for Windows yet. :)
Description
Our unit test ensures that loading model and running inference successfully, but cannot indicate the output is reasonable instead of kinda garbage. Currently we need to run the examples when making some major features & fix, which is a bit annoying.
To address this issue, I think we could send the output to OpenAI chatgpt to check if it's reasonable. I will afford the payment of the tokens but will use
github.triggering_actor
to allow only developers who have write access to trigger the corresponding workflow.The text was updated successfully, but these errors were encountered: