Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Vector search recall accurate rate observation #493

Open
634750802 opened this issue Dec 11, 2024 · 1 comment
Open

Feature: Vector search recall accurate rate observation #493

634750802 opened this issue Dec 11, 2024 · 1 comment

Comments

@634750802
Copy link
Collaborator

634750802 commented Dec 11, 2024

  1. Filter Vector Search Calls

    • Identify and filter vector search calls that utilize a vector index.
    • Apply a sampling rate (e.g., 1 out of every 10 calls) to selectively choose the vector search calls for further analysis.
  2. Generate and Execute New Vector Search Calls

    • Create several new vector search calls based on the filtered data.
    • Configure these new search calls with:
      • A larger LIMIT (topN): Increase the number of results returned to better analyze the search performance.
      • TiKV (full scan): Execute the queries using TiKV to perform a complete scan, ensuring all relevant data is retrieved.
  3. Perform Calculations and Save Data

    • Compute necessary metrics and collect relevant information from the search results for each sampled vector search call.
    • Save the following information into a specified database for future analysis:
      • Text: The input text or query.
      • Limit: The specified result limit (topN).
      • Type: The type of operation, which can either be a vector-based or graph-based search.
      • Embedding: The vector representation of the input query.
      • Recall Accuracy Rate: A measurement of the accuracy of the results based on the recall rate.
      • Chunks Metadata: Metadata about the chunks (fragments) of data retrieved during the search.
      • Expected Chunks Metadata: Predefined or anticipated metadata about the chunks for comparison.
      • Knowledge Base ID: The identifier for the relevant knowledge base being searched.
      • Timestamp: The time the observation data was recorded.
@Mini256
Copy link
Member

Mini256 commented Dec 12, 2024

@Icemap Does Ragas have the corresponding metric, and should we also include this metric as part of the evaluation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants