Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Synonym insertion #160

Merged
merged 11 commits into from
Oct 28, 2021
Merged

Conversation

vukosim
Copy link
Contributor

@vukosim vukosim commented Jul 25, 2021

No description provided.

@kaustubhdhole
Copy link
Collaborator

This is already implemented and is on the verge of merging: #51

@vukosim
Copy link
Contributor Author

vukosim commented Jul 26, 2021

@JosephSefara please see

@JosephSefara
Copy link
Contributor

@kaustubhdhole This implementation is different from #51 .

@kaustubhdhole
Copy link
Collaborator

Thank you for the clarification @JosephSefara. I think these changes look good to me.

@kaustubhdhole kaustubhdhole self-requested a review August 26, 2021 00:02
@kaustubhdhole
Copy link
Collaborator

Please pull main once in your branch.

@kaustubhdhole
Copy link
Collaborator

Okay, I just have one comment: Do you think it might be better to write synonynms besides the original words so that the sentence is still well-formed?

https://github.com/dsfsi/textaugment
"""
tasks = [TaskType.TEXT_CLASSIFICATION, TaskType.TEXT_TO_TEXT_GENERATION]
languages = ["en"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add keywords here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I would also recommend adding the robustness evaluation for your PR that can be added to the leaderboard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • keywords added
  • readme and test.json contains the results of the Robustness Evaluation for
    • Text Classification
    • Text Generation

@kaustubhdhole
Copy link
Collaborator

@vukosim @JosephSefara ping!

@kaustubhdhole
Copy link
Collaborator

Also, one more thing: do you think it would be better to include the synonym within a bracket?

@JosephSefara
Copy link
Contributor

Okay, I just have one comment: Do you think it might be better to write synonynms besides the original words so that the sentence is still well-formed?

@kaustubhdhole, I don't understand you question but we are inserting a synonym next to its original word. Augmentation sometimes is about adding noise to the sentence hence the sentence might not be well-formed but still retains original context. E.g. stopwords removal #268 removes stop words, thus the sentence might not be well-formed.

@JosephSefara
Copy link
Contributor

Also, one more thing: do you think it would be better to include the synonym within a bracket?

@kaustubhdhole
That can be done but not a good idea since most people clean their text before augmentation.

  • Text cleaning includes removal of special characters including brackets.

Why remove special characters?

  • They do not carry any meaning and they create noise depending on the task being done.

@kaustubhdhole kaustubhdhole merged commit 6fecd62 into GEM-benchmark:main Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants