Evaluate RAG project #9

Aisuko · 2024-06-20T04:35:42Z

Aisuko
Jun 20, 2024
Maintainer

As we discussed, we need a new project to evaluate the RAG pipeline. So, here is an article about this topic. https://nayakpplaban.medium.com/evaluate-rag-pipeline-response-using-python-5dcbbe9a60c5

Related issues

[feature]: evaluation tools for rag result kimchima#38

Micost · 2024-06-20T16:30:13Z

Micost
Jun 20, 2024
Maintainer

Just read the article. I think the evaluation methods mentioned the article are easy to used. However, I have two questions about evaluation

1, Do we need to evaluation result of each step , or end to end ?

According to the article, it is easier to pinpoint issues when break RAG to multiple step. But each step have different purpose which lead to different evaluations method applied to different steps. No unified method to measure the accuracy of each step would not help to find the weakness in the whole chain. So in my opinion, we better focus on end to end evaluation.

2, Which metrics should we use ？

I didn't find any of these metrics fits our chat conversation best. Maybe we should generate ROUGE , BERTScore and Perplexity metrics plus the relevance score as a final result. Please let me if there are better metrics of If you have any thoughts about this topic

@Aisuko

4 replies

Aisuko Jun 21, 2024
Maintainer Author

Normally, we focus on end-to-end testing in the test project. Each step things should be considered in training process. 2. We need to do some research and write a proposal for it.

Aisuko Jun 21, 2024
Maintainer Author

I have found an article talks about LLM's monitoring and observability. I agree with this article. So, here is my idea. Can we focus on the Perplexity, Cosine Similarity(We already have) and Sentiment Analysis? @Micost

Micost Jun 21, 2024
Maintainer

Agreed . One more thing about the evaluation. Is there any benchmark that widely used by LLMs? I think we need some thing that related to sematric search .

Aisuko Jun 21, 2024
Maintainer Author

Normally, we have a Open LLM Leaderboard. Its page named About has some tasks. And the another useful project llm-autoeval may provide more clearly information for these tasks @Micost

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SkywardAI

Evaluate RAG project #9

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

SkywardAI

Evaluate RAG project #9

Aisuko Jun 20, 2024 Maintainer

Related issues

Replies: 1 comment · 4 replies

Micost Jun 20, 2024 Maintainer

Aisuko Jun 21, 2024 Maintainer Author

Aisuko Jun 21, 2024 Maintainer Author

Micost Jun 21, 2024 Maintainer

Aisuko Jun 21, 2024 Maintainer Author

Aisuko
Jun 20, 2024
Maintainer

Replies: 1 comment 4 replies

Micost
Jun 20, 2024
Maintainer

Aisuko Jun 21, 2024
Maintainer Author

Aisuko Jun 21, 2024
Maintainer Author

Micost Jun 21, 2024
Maintainer

Aisuko Jun 21, 2024
Maintainer Author