-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add gender randomizer #229
base: main
Are you sure you want to change the base?
Conversation
Author name: Tabitha Sugumar | ||
Author email: __ | ||
Author Affiliation: __ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your changes @tk-sugumar . Please add your email and affiliation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
|
||
## Examples of this transformation | ||
|
||
Because this is a randomized transformation, in both the selection of gender and selection of name, test examples are impossible -- the output for a single sentence is expected to be different in each successive run. Instead I've provided some example sentences and outputs for reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you can use a default seed in the argument in init
of your GenderRandomizer transformation so you can generate consistent results for your test cases so you can include them in your test.json
Quite a few of the PRs use this approach for test cases.
See for example:
https://github.com/GEM-benchmark/NL-Augmenter/pull/164/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Timothy! When I tried this, the same name was predicted for each sentence, so for use as intended I think the user would have to modify the code after downloading. Should I still go ahead and do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Timothy, I added in the seed in the initializer, the name names does get predicted each time though, I hope it's ok! Test cases are also added in the test.json
Author Affiliation: Elsevier | ||
|
||
## What type of a transformation is this? | ||
This transformation changes names in English texts, randomizing selection so there's an even chance of male and female names. It modifies pronouns to match the selected name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an acknowledgement that names are not deterministic identifiers of someones pronouns/gender :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
Randomizes names in text for a 50/50 gender breakdown. Handles pronouns. | ||
""" | ||
nlp = spacy.load("en_core_web_sm", disable=["lemmatizer"]) | ||
nlp.add_pipe("coreferee") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to use spacy like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified as given in example
class GenderRandomizer(SentenceOperation): | ||
tasks = [TaskType.TEXT_TO_TEXT_GENERATION] | ||
languages = ["en"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add some keywords here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
…itialization, added tests to text.json
## What tasks does it intend to benefit? | ||
This is intended to avoid gender bias in natural language processing models. Run this transformation on text data prior to using it to train a model. | ||
|
||
## Previous Work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importantly please add a Data and Code Provenance section to your transformation. Also, seems you've added about a 109 files which are hard to evaluate. I would suggest moving this into a separate pip project out of this and then adding it to the requirements.txt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've expanded on the data and code provenance, and put the description in a Data and Code Provenance section in the Readme.
On the 109 files -- most of them come from the coreferee directory -- this actually already exists as a library installable by pip, but when I was working on this was only installable in python 3.8 and the current version requires python 3.9. Since these transformations are required to be compatible with python 3.7, I downloaded here to make it installable in python 3.7.
Hi @tk-sugumar, it won't be a good idea to merge all of these in the repository. It would be better to make a pip library out of it in a separate repository and call only the relevant parts here. @AbinayaM02 thoughts |
Agreed. Like @kaustubhdhole mentioned, you should be installing the library (specify it in the reuirements.txt) and use it for your transformation @tk-sugumar. You can check if the library works fine for python 3.7. |
No description provided.