Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any plan to support reason in LLMMetrics and EvaluationResult? #1813

Open
parkerzf opened this issue Jan 6, 2025 · 4 comments
Open
Labels
question Further information is requested

Comments

@parkerzf
Copy link

parkerzf commented Jan 6, 2025

[ * ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question
For evaluation explainability, it is very valuable to also include the reason into the result so that we understand better how LLM metrics making decisions and improve the metrics accordingly.

Is there any plan to support it? Thanks!

Code Examples
NA

Additional context
Anything else you want to share with us?

@parkerzf parkerzf added the question Further information is requested label Jan 6, 2025
@jjmachan
Copy link
Member

jjmachan commented Jan 7, 2025

@parkerzf you are right and we currently do have something like https://docs.ragas.io/en/stable/howtos/applications/_metrics_llm_calls to help with it. Can you check and see if it works for your usecase?

@parkerzf
Copy link
Author

parkerzf commented Jan 7, 2025

Hey @jjmachan Thanks for the reply! I think it is very close to what I am looking for.

Ideally, I would like to use it as following:

from datasets import load_dataset
from ragas import EvaluationDataset
from ragas import evaluate
from ragas.metrics._aspect_critic import harmfulness

dataset = load_dataset("explodinggradients/amnesty_qa", "english_v3")


eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])

results = evaluate(eval_dataset[:5], metrics=[harmfulness])
results.to_pandas(including_trace=True)

The output dataframe contains the following columns:
["user_input", "response", "harmfulness", "harmfulness_reason"]

What is the ETA of this new feature? I would like to try it out.

@sahusiddharth sahusiddharth added the waiting 🤖 waiting for response. In none will close this automatically label Jan 22, 2025
Copy link

Closing after 8 days of waiting for the additional info requested.

@github-actions github-actions bot removed the waiting 🤖 waiting for response. In none will close this automatically label Jan 22, 2025
@jjmachan jjmachan reopened this Jan 22, 2025
@jjmachan
Copy link
Member

hey @parkerzf I'm don't think we will add it to the pandas dataframe, it will be redundant and I don't thing we can show multiple steps here - like for faithfulness or metrics where LLM is just one part of the process - like context recall

what we can do this for is aspect critic, since the verdict and reason are the only fields it shows. But in order to do that you can write a simple function. I can write it up for you if you want

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants