[Question]: Best way to integrate testing and monitoring tools, something like Agenta #3504

alexff77 · 2024-11-19T19:38:50Z

Describe your problem

Hi,
first I want to say that you guys are building a useful tool!
Great job!
We’re exploring ways to enhance our workflows with RAGFlow and have a few questions about integrating it with tools like Agenta and improving observability:

Automating Testing of Retrieval and Model Replies
We aim to automate testing for both retrieval results and model-generated replies. Specifically:

We can write custom code using RAGFLow API to validate retrieval results and responses.
Is there a recommended or built-in approach in RAGFLow to perform such automated tests?
Iterating on Prompts and Parameters

We’re interested in integrating prompt management and evaluation tools, such as Agenta, to improve iteration and experimentation.

Does RAGFLow support or provide guidance for integrating with tools like Agenta for:
    Creating and versioning prompts?
    Running evaluations on prompts and analyzing results?
    Connecting prompt outcomes to observability for performance monitoring?

Monitoring and Logging Improvements

To enhance observability and debugging, have you considered adopting OpenTelemetry standards for logging OpenTelemetry’s documentation.

Does RAGFLow currently support or have plans to support OpenTelemetry for structured logging?
What options are available for tracking logs and metrics across retrieval, generation, and overall system performance?

We are happy to contribute on this if helpful.
Your input is much appreciated!
Looking forward to your insights and suggestions!

The text was updated successfully, but these errors were encountered:

KevinHuSh · 2024-11-20T03:38:17Z

1. For the time being, we use rag/benchmark.py to test retrival relevance. Our result on the test set of microsoft/ms_marco v1.1 is as below:
- {'ndcg@10': 0.49 'map@5': 0.37, 'mrr@10': 0.40}
- The result of jina-colbert v2 is as below:
1. We don't know much about Agenta which seems interesting. We're gona do some research on it.
1. That's a very tremendous suggestion. We're gona do some research.

alexff77 · 2024-11-26T19:56:08Z

Thank you for the info, very helpful!
Another question on "feedback" feature.
Currently it's available internally in the app for each reply, are you planing to expose this for API use as well?

Thank you!

bingzhenpan · 2024-11-27T03:22:35Z

For the time being, we use rag/benchmark.py to test retrival relevance. Our result on the test set of microsoft/ms_marco v1.1 is as below:

{'ndcg@10': 0.49 'map@5': 0.37, 'mrr@10': 0.40}

The result of jina-colbert v2 is as below:

We don't know much about Agenta which seems interesting. We're gona do some research on it.

That's a very tremendous suggestion. We're gona do some research.

I can not reproduce this test result. Could you please provide the complete command of script benchmark.py to run this test?

alexff77 added the question Further information is requested label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Best way to integrate testing and monitoring tools, something like Agenta #3504

[Question]: Best way to integrate testing and monitoring tools, something like Agenta #3504

alexff77 commented Nov 19, 2024

KevinHuSh commented Nov 20, 2024 •

edited

Loading

alexff77 commented Nov 26, 2024

bingzhenpan commented Nov 27, 2024

[Question]: Best way to integrate testing and monitoring tools, something like Agenta #3504

[Question]: Best way to integrate testing and monitoring tools, something like Agenta #3504

Comments

alexff77 commented Nov 19, 2024

Describe your problem

KevinHuSh commented Nov 20, 2024 • edited Loading

alexff77 commented Nov 26, 2024

bingzhenpan commented Nov 27, 2024

KevinHuSh commented Nov 20, 2024 •

edited

Loading