-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any plan to support reason
in LLMMetrics and EvaluationResult?
#1813
Comments
@parkerzf you are right and we currently do have something like https://docs.ragas.io/en/stable/howtos/applications/_metrics_llm_calls to help with it. Can you check and see if it works for your usecase? |
Hey @jjmachan Thanks for the reply! I think it is very close to what I am looking for. Ideally, I would like to use it as following: from datasets import load_dataset
from ragas import EvaluationDataset
from ragas import evaluate
from ragas.metrics._aspect_critic import harmfulness
dataset = load_dataset("explodinggradients/amnesty_qa", "english_v3")
eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])
results = evaluate(eval_dataset[:5], metrics=[harmfulness])
results.to_pandas(including_trace=True) The output dataframe contains the following columns: What is the ETA of this new feature? I would like to try it out. |
Closing after 8 days of waiting for the additional info requested. |
hey @parkerzf I'm don't think we will add it to the pandas dataframe, it will be redundant and I don't thing we can show multiple steps here - like for faithfulness or metrics where LLM is just one part of the process - like context recall what we can do this for is aspect critic, since the verdict and reason are the only fields it shows. But in order to do that you can write a simple function. I can write it up for you if you want |
[ * ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question
For evaluation explainability, it is very valuable to also include the reason into the result so that we understand better how LLM metrics making decisions and improve the metrics accordingly.
Is there any plan to support it? Thanks!
Code Examples
NA
Additional context
Anything else you want to share with us?
The text was updated successfully, but these errors were encountered: