Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Best way to integrate testing and monitoring tools, something like Agenta #3504

Open
alexff77 opened this issue Nov 19, 2024 · 3 comments
Labels
question Further information is requested

Comments

@alexff77
Copy link

Describe your problem

Hi,
first I want to say that you guys are building a useful tool!
Great job!
We’re exploring ways to enhance our workflows with RAGFlow and have a few questions about integrating it with tools like Agenta and improving observability:

  1. Automating Testing of Retrieval and Model Replies
    We aim to automate testing for both retrieval results and model-generated replies. Specifically:

    We can write custom code using RAGFLow API to validate retrieval results and responses.
    Is there a recommended or built-in approach in RAGFLow to perform such automated tests?

  2. Iterating on Prompts and Parameters

We’re interested in integrating prompt management and evaluation tools, such as Agenta, to improve iteration and experimentation.

Does RAGFLow support or provide guidance for integrating with tools like Agenta for:
    Creating and versioning prompts?
    Running evaluations on prompts and analyzing results?
    Connecting prompt outcomes to observability for performance monitoring?
  1. Monitoring and Logging Improvements

To enhance observability and debugging, have you considered adopting OpenTelemetry standards for logging OpenTelemetry’s documentation.

Does RAGFLow currently support or have plans to support OpenTelemetry for structured logging?
What options are available for tracking logs and metrics across retrieval, generation, and overall system performance?

We are happy to contribute on this if helpful.
Your input is much appreciated!
Looking forward to your insights and suggestions!

@alexff77 alexff77 added the question Further information is requested label Nov 19, 2024
@KevinHuSh
Copy link
Collaborator

KevinHuSh commented Nov 20, 2024

    1. For the time being, we use rag/benchmark.py to test retrival relevance. Our result on the test set of microsoft/ms_marco v1.1 is as below:
    • {'ndcg@10': 0.49 'map@5': 0.37, 'mrr@10': 0.40}
    • The result of jina-colbert v2 is as below:
      image
    1. We don't know much about Agenta which seems interesting. We're gona do some research on it.
    1. That's a very tremendous suggestion. We're gona do some research.

@alexff77
Copy link
Author

Thank you for the info, very helpful!
Another question on "feedback" feature.
Currently it's available internally in the app for each reply, are you planing to expose this for API use as well?

Thank you!

@bingzhenpan
Copy link

    1. For the time being, we use rag/benchmark.py to test retrival relevance. Our result on the test set of microsoft/ms_marco v1.1 is as below:
    • {'ndcg@10': 0.49 'map@5': 0.37, 'mrr@10': 0.40}
    • The result of jina-colbert v2 is as below:
      image
    1. We don't know much about Agenta which seems interesting. We're gona do some research on it.
    1. That's a very tremendous suggestion. We're gona do some research.

I can not reproduce this test result. Could you please provide the complete command of script benchmark.py to run this test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants