Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default_transforms() Missing cosine_similarity Transform for Docs with Token Count 101–500 #1932

Open
rgrizzo-linksmt opened this issue Feb 20, 2025 · 3 comments
Labels
bug Something isn't working module-testsetgen Module testset generation

Comments

@rgrizzo-linksmt
Copy link

At:
[default.py#L157](

),

you forgot to add the cosine_sim_builder transform to the list of default transforms. While you instantiated the cosine_sim_builder object, you did not include it in Parallel(cosine_sim_builder, ner_overlap_sim), as you did in:

[default.py#L126](https://github.com/explodinggradients/ragas/blob/2bc29a2b8358ddb6b167fdf7ab0518ad9371463c/src/ragas/testset/transforms/default.py#L126C13-L126C59).

This omission might impact the number of relationships created in the knowledge graph.

Ragas version: ragas==0.2.13
Python version: Python 3.10.12

@rgrizzo-linksmt rgrizzo-linksmt added the bug Something isn't working label Feb 20, 2025
@dosubot dosubot bot added the module-testsetgen Module testset generation label Feb 20, 2025
@Vidit-Ostwal
Copy link
Contributor

Can you share the problem you are facing?

@rgrizzo-linksmt
Copy link
Author

rgrizzo-linksmt commented Feb 24, 2025

The default_ transforms function defined at src/ragas/testset/transforms/default.py has a problem with handling transforms for documents with 101-500 tokens.

The code divides the transforms configurations based on the document's token count. Several transforms are instantiated when the "101-500" token count bins the first quartile (Q1, among them the cosine_sim_builder. While cosine_sim_builder is correctly instantiated (line 139), it's then not included in the list of transforms that are actually returned (line 153).

It appears that cosine_sim_builder was likely unintentionally omitted from the returned transforms list. The intended behavior should probably mirror how ner_overlap_sim is handled (line 120), where cosine_sim_builder is instantiated and added to the returned list. The current code effectively instantiates cosine_sim_builder but then discards it. This omission might impact the number of relationships created in the knowledge graph.

Hoping this can help you understand the problem I'm facing!

@Vidit-Ostwal
Copy link
Contributor

Vidit-Ostwal commented Feb 24, 2025

Accha got this,
In src/ragas/testset/transforms/default.py
Instead of line 157 to be ner_overlap_sim,
this should be Parallel(cosine_sim_builder, ner_overlap_sim).

This makes sense.

@rgrizzo-linksmt I think you should raise a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module-testsetgen Module testset generation
Projects
None yet
Development

No branches or pull requests

2 participants