Replies: 1 comment 4 replies
-
Just read the article. I think the evaluation methods mentioned the article are easy to used. However, I have two questions about evaluation 1, Do we need to evaluation result of each step , or end to end ? According to the article, it is easier to pinpoint issues when break RAG to multiple step. But each step have different purpose which lead to different evaluations method applied to different steps. No unified method to measure the accuracy of each step would not help to find the weakness in the whole chain. So in my opinion, we better focus on end to end evaluation. 2, Which metrics should we use ? I didn't find any of these metrics fits our chat conversation best. Maybe we should generate ROUGE , BERTScore and Perplexity metrics plus the relevance score as a final result. Please let me if there are better metrics of If you have any thoughts about this topic |
Beta Was this translation helpful? Give feedback.
-
As we discussed, we need a new project to evaluate the RAG pipeline. So, here is an article about this topic. https://nayakpplaban.medium.com/evaluate-rag-pipeline-response-using-python-5dcbbe9a60c5
Related issues
Beta Was this translation helpful? Give feedback.
All reactions