You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Explore the new tool released by Microsoft for evaluation of LLMs.
Brief description:
It consists of a wide range of LLMs and evaluation datasets, covering diverse tasks, evaluation protocols, adversarial prompt attacks, and prompt engineering techniques. As a holistic library, it also supports several analysis tools for interpreting the results. It is designed in a modular fashion, allowing to build evaluation pipelines for custom projects.
So, I think we should check what are the techniques they use to evaluate the models, as well as datasets they support, tasks, and analysis tools to interpret the results.
Explore the new tool released by Microsoft for evaluation of LLMs.
Brief description:
So, I think we should check what are the techniques they use to evaluate the models, as well as datasets they support, tasks, and analysis tools to interpret the results.
Github link: promptbench
The text was updated successfully, but these errors were encountered: