New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

fix: load detoxify model from state dict and upgrade transformers version #180

Merged

oyangz merged 3 commits into aws:main from oyangz:update_detoxify

Feb 7, 2024

Contributor

oyangz commented Jan 30, 2024 •

edited

Loading

Issue #, if available:
https://tiny.amazon.com/f6f228ty/issuamazissuRAI7

We need to update transformers to >= v4.36.0 due to security vulnerabilities, but
the latest detoxify v0.5.1 requires transformers v4.22.1. Additionally, the current method for loading models in detoxify errors with transformers >v4.30.0, see issue.

Description of changes:

To resolve the dependency conflict and model loading issue, this PR:

Loads the unbiased detoxify model directly from the state dict file to remove dependency on detoxify package. Model loading method is based on detoxify's load_checkpoint with modifications to address the issue above.
Upgrades transformers version to ^4.36.0 to resolve security vulnerabilities. (This caused bertscore metric to output slightly different values, where bertscore output differs for < v4.24.0 and >= v4.24.0, similar to in this issue).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

oyangz and others added 3 commits

January 30, 2024 00:27


          fix: load detoxify model from state dict and upgrade transformers ver…

58e1dd2

…sion


          Merge branch 'main' into update_detoxify

ab7aaee


          update bertscore values for integ tests

438b8d5

danielezhu reviewed

View reviewed changes

src/fmeval/eval_algorithms/helper_models/helper_model.py

                   TODO: To be switched to consuming HF model once consistency issue is resolved:
                   https://huggingface.co/unitary/unbiased-toxic-roberta. This will allow removing detoxify PyPI as a dependency,
                   update transformers version we are consuming.
                   """
-                  DETOXIFY_MODEL_TYPE = "unbiased"
+                  UNBIASED_MODEL_URL = (
+                      "https://github.com/unitaryai/detoxify/releases/download/v0.3-alpha/toxic_debiased-c7548aa0.ckpt"

Contributor

danielezhu Jan 31, 2024 •

edited

Loading

Should we add this file directly to the fmeval repo so that we don't rely on the detoxify repo? This isn't a major concern (non-blocking).

Contributor

malhotra18 Feb 4, 2024

Let's discuss this with science team first, if we need to check on it with legal.

danielezhu approved these changes

View reviewed changes

danielezhu reviewed

View reviewed changes

src/fmeval/eval_algorithms/helper_models/helper_model.py

+                              state_dict=state_dict["state_dict"],
+                              local_files_only=False,
+                          )
+                          .to("cpu")

Contributor

danielezhu Jan 31, 2024

I think this is fine for now, but we should ideally identify whether any GPUs exist, and if so, place the model on the GPU instead.

Contributor

franluca Feb 6, 2024

+1
also if map_location is used I don't think you need a to('cpu') at the end

malhotra18 approved these changes

View reviewed changes

src/fmeval/eval_algorithms/helper_models/helper_model.py

                   TODO: To be switched to consuming HF model once consistency issue is resolved:
                   https://huggingface.co/unitary/unbiased-toxic-roberta. This will allow removing detoxify PyPI as a dependency,
                   update transformers version we are consuming.
                   """
-                  DETOXIFY_MODEL_TYPE = "unbiased"
+                  UNBIASED_MODEL_URL = (
+                      "https://github.com/unitaryai/detoxify/releases/download/v0.3-alpha/toxic_debiased-c7548aa0.ckpt"

Contributor

malhotra18 Feb 4, 2024

Let's discuss this with science team first, if we need to check on it with legal.

franluca reviewed

View reviewed changes

Contributor

franluca left a comment

Please see comments on batch mode.

src/fmeval/eval_algorithms/helper_models/helper_model.py

+                              state_dict=state_dict["state_dict"],
+                              local_files_only=False,
+                          )
+                          .to("cpu")

Contributor

franluca Feb 6, 2024

+1
also if map_location is used I don't think you need a to('cpu') at the end

src/fmeval/eval_algorithms/helper_models/helper_model.py

                       :param text_input: list of text inputs for the model
                       :returns: dict with keys as score name and value being list of scores for text inputs
                       """
-                      return self._model(text_input)
+                      inputs = self._tokenizer(text_input, return_tensors="pt", truncation=True, padding=True).to(self._model.device)
+                      scores = torch.sigmoid(self._model(**inputs)[0]).cpu().detach().numpy()

Contributor

franluca Feb 6, 2024

It's unclear if this is supposed to work in a batch call or not. why do you select self._model(**inputs)[0]? Are we assuming text_input is a list with only one string?

Contributor Author

oyangz Feb 6, 2024 •

edited

Loading

self._model(**inputs) returns an object of class SequenceClassifierOutput, where the [0] is the location of the tensor containing model output values. The tensor can be two dimensional so batching is still supported here, and our helper model unit test with multiple string inputs also passed.

This was referenced from detoxify repo's predict method.

Contributor

franluca Feb 7, 2024

Ok thanks! Please add some inline comments for future reference :)

src/fmeval/eval_algorithms/helper_models/helper_model.py

+                      for i, cla in enumerate(DetoxifyHelperModel.get_score_names()):
+                          results[cla] = (
+                              scores[0][i]
+                              if isinstance(text_input, str)

Contributor

franluca Feb 6, 2024

indeed, from signature text_input should be a List. I don't think this method works if text_input is a list with more than one string (because of line 144)

Contributor Author

oyangz Feb 6, 2024

See comment above.

test/integration/test_summarization_accuracy.py

@@ @@ -42,7 +42,7 @@ def test_evaluate_sample(self, integration_tests_dir): @@
                               elif eval_score.name == ROUGE_SCORE:
                                   assert eval_score.value == approx(0.250, abs=ABS_TOL)
                               elif eval_score.name == BERT_SCORE:
-                                  assert eval_score.value == approx(0.734, abs=ABS_TOL)
+                                  assert eval_score.value == approx(0.748, abs=ABS_TOL)

Contributor

franluca Feb 6, 2024

why?

Contributor Author

oyangz Feb 6, 2024

Upgrading transformers version to ^4.36.0 caused the bertscore values to be slightly different.

I did some testing and found that bertscore output differs for < v4.24.0 and >= v4.24.0, (previously we were using v4.22.1). This issue is similar to what was observed in this github issue.

I wasn't able to find the root cause for the change but it seems like this occurs sometimes from pytorch/transformers upgrades, see previous issue.

franluca approved these changes

View reviewed changes

oyangz merged commit a523f3d into aws:main

3 checks passed

danielezhu mentioned this pull request

fix: Fix example notebook unit tests #188

Merged

keerthanvasist mentioned this pull request

fmeval is incompatible to properly install on (new) SageMaker Studio Python kernel #164

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet