[Re] When Does Label Smoothing Help? #75

sdwagner · 2023-08-30T16:47:45Z

Original article: Rafael Müller, Simon Kornblith, and Geoffrey E. Hinton. "When does label smoothing help?." Advances in neural information processing systems 32 (2019). (https://arxiv.org/pdf/1906.02629.pdf)

PDF URL: https://github.com/sdwagner/re-labelsmoothing/blob/main/report/article.pdf
Metadata URL: https://github.com/sdwagner/re-labelsmoothing/blob/main/report/metadata.yaml
Code URL: https://github.com/sdwagner/re-labelsmoothing

Scientific domain: Machine Learning
Programming language: Python
Suggested editor: Georgios Detorakis or Koustuv Sinha

rougier · 2023-09-11T09:47:17Z

Thanks for your submission, we'll assign an editor soon.

rougier · 2023-09-11T09:49:21Z

@koustuvsinha @gdetor Can any of you two edit this submission in machine learning?

gdetor · 2023-09-15T05:36:28Z

@rougier I can handle this.

rougier · 2023-10-12T11:30:27Z

@gdetor Thanks!

tuelwer · 2024-04-03T14:04:09Z

@gdetor thank you for agreeing to handle this submission! Is there anything we can do to move this submission forward?

gdetor · 2024-04-03T17:20:47Z

@tuelwer Sorry for the delay.

Hi @ogrisel and @benureau Could you please review this submission?

rougier · 2024-04-29T12:19:04Z

Any update?

gdetor · 2024-05-08T18:18:32Z

Dear reviewers @ReScience/reviewers Could anybody review this submission?

mo-arvan · 2024-05-08T18:25:38Z

I'd be interested in reviewing this submission, but I have to mention, I doubt I can rerun all the experiments due to computational constraints.

rougier · 2024-05-27T13:29:42Z

@mo-arvan Thanks and I think not re-doing everything is fine. @gdetor What do you think?

gdetor · 2024-05-27T17:42:46Z

@rougier @mo-arvan I'm OK with it.

mo-arvan · 2024-05-28T14:56:34Z

Okay, I will review this work by the end of July.

mo-arvan · 2024-08-02T17:24:53Z

I apologize, but I have not been able to review this submission yet, should be able to write the review within the next few weeks.

rougier · 2024-09-02T08:43:57Z

Thanks. Any progress?

gdetor · 2024-10-01T21:21:16Z

@mo-arvan gentle reminder

mo-arvan · 2024-10-10T01:08:43Z

In this paper, Wagner et al. provide a reproduction report of Müller et al.'s work on label smoothing. They begin with a concise introduction to the original study and the motivations behind it. The authors then present essential details regarding the models and datasets used, noting specific variations driven by limited computational resources.

The authors have done an excellent job of providing documentation and instructions for using their released code. Their repository includes multiple Jupyter notebooks detailing the conducted experiments, along with specified dependency requirements to facilitate the setup process. To further simplify future installations, I created a Docker container as part of the review process. The files and instructions are available in my forked repository.

In their initial results, the authors examine the effect of label smoothing on model accuracy. While Müller et al. claimed that label smoothing positively impacts the test accuracy of trained models, Wagner et al. suggest that it enhances accuracy by reducing overfitting—a claim not made by the original authors. However, their results indicate mixed effects; out of eight experiments, three showed higher accuracy without label smoothing. Upon reviewing their code (https://github.com/sdwagner/re-labelsmoothing/blob/fb6c3634d2049ef7f175e7a992f109c43680fae3/datasets/datasets.py), it appears that they do not load the test set, raising the possibility that the reported results are based on the validation set. Unlike the original study, this reproduction does not include confidence intervals, and the small differences in accuracy could be attributed to randomness in the training process. Adding uncertainty analysis would significantly strengthen this work.

In the next section, the authors reproduce the results of a visualization experiment from the original study that demonstrates the effect of label smoothing on the activations of the penultimate layer and the network output. Figure 2 in their work aligns with the findings of the original study, although there is a minor discrepancy in the order of the columns in the visualization.

The authors then investigate the impact of label smoothing on Expected Calibration Error (ECE). With the exception of the results from the MNIST dataset using a fully connected network, their findings generally align with those of the original study. The reported results for training a transformer model for translation are mixed, with not all findings matching the original study. Similar to the accuracy results, the authors report findings based on the validation set, which may account for some discrepancies.

Finally, the results of the distillation experiments on fully connected networks for MNIST are consistent with the original study, though there is a slight increase in error. Ultimately, the authors confirm the observation made by Müller et al. regarding accuracy degradation in students when the teacher is trained with label smoothing. Figure 7 and 8 lack the confidence intervals present in the original study, which would have been beneficial for comparison.

Minor editing suggestions:
"The authors state, that the logit dependents on the Euclidean distance" -> "The authors state that the logit depends on the Euclidean distance"
"The evaluation was performed using the ECE" -> ECE should be spelled out on first use.

gdetor · 2024-10-10T03:36:50Z

@mo-arvan Thank you for your report.
@tuelwer @sdwagner Could you please respond to the reviewer's comments?

tuelwer · 2024-10-10T17:21:54Z

@mo-arvan Thank you for reviewing our submission and you thoughtful and detailed comments!
@gdetor We will update our submission in the next days to incorporate the reviewer's comments.

tuelwer · 2024-10-11T07:03:59Z

@mo-arvan Thanks for creating a dockerfile! Feel free to open a PR to integrate it into our repository 😊

mo-arvan · 2024-10-11T21:56:52Z

Glad you find it useful. Sure, I'll submit a pull request. I'd be happy to engage in a discussion as well.

One last minor comment, your use of vector graphics in your figures is a step up from the original publication, I'd suggest changing the color palette and the patterns to further improve the presentation of the figures, e.g. Figure 3 (b).

tuelwer · 2024-10-14T09:31:30Z

@mo-arvan Thanks again for your detailed comments! In the following we want to address each of the points that you raised:

Confusion validation and test data: We carefully double-checked our datasets and can confirm that all experiments were performed on the test split of each dataset:
- For the datasets implemented by PyTorch, we set train=False, which corresponds to the test split (please refer to, e.g., here).
- For the CUB-200-2011 data we use the test split of the dataset which is defined in the file train_test_split.txt. The CUB-200-2011 dataset does not have a validation set.
- For the Tiny ImageNet we use the split that is defined as validation split. We assume that the authors of the paper did this as well, since the test data split of the Tiny ImageNet dataset is not labeled.
  We apologize for the confusion, and we have refactored the code accordingly.
Uncertainty quantification: We added confidence intervals for Figure 6 and 7.
Color palette: We have chosen the colors that were used in the original work to allow easy comparison of the experimental results.
Edits: We have incorporated the proposed changes into our report.

gdetor · 2024-10-16T20:27:28Z

@mo-arvan Please let me know if you agree with the responses so I can publish the paper. Thank you.

mo-arvan · 2024-11-01T16:09:49Z

Yes, the response addresses my main concerns. I was wrong about the validation/test splits.

tuelwer · 2024-12-20T11:25:12Z

@gdetor Can you kindly let us know what the next steps are? Is there anything required from our side? Thank you!

rougier added DOM: Machine Learning LANG: Python 01 - Request labels Sep 11, 2023

rougier assigned gdetor Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Re] When Does Label Smoothing Help? #75

[Re] When Does Label Smoothing Help? #75

sdwagner commented Aug 30, 2023

rougier commented Sep 11, 2023

rougier commented Sep 11, 2023

gdetor commented Sep 15, 2023

rougier commented Oct 12, 2023

tuelwer commented Apr 3, 2024

gdetor commented Apr 3, 2024

rougier commented Apr 29, 2024

gdetor commented May 8, 2024

mo-arvan commented May 8, 2024

rougier commented May 27, 2024

gdetor commented May 27, 2024

mo-arvan commented May 28, 2024

mo-arvan commented Aug 2, 2024

rougier commented Sep 2, 2024 •

edited

Loading

gdetor commented Oct 1, 2024

mo-arvan commented Oct 10, 2024

gdetor commented Oct 10, 2024

tuelwer commented Oct 10, 2024

tuelwer commented Oct 11, 2024

mo-arvan commented Oct 11, 2024

tuelwer commented Oct 14, 2024

gdetor commented Oct 16, 2024

mo-arvan commented Nov 1, 2024

tuelwer commented Dec 20, 2024

[Re] When Does Label Smoothing Help? #75

[Re] When Does Label Smoothing Help? #75

Comments

sdwagner commented Aug 30, 2023

rougier commented Sep 11, 2023

rougier commented Sep 11, 2023

gdetor commented Sep 15, 2023

rougier commented Oct 12, 2023

tuelwer commented Apr 3, 2024

gdetor commented Apr 3, 2024

rougier commented Apr 29, 2024

gdetor commented May 8, 2024

mo-arvan commented May 8, 2024

rougier commented May 27, 2024

gdetor commented May 27, 2024

mo-arvan commented May 28, 2024

mo-arvan commented Aug 2, 2024

rougier commented Sep 2, 2024 • edited Loading

gdetor commented Oct 1, 2024

mo-arvan commented Oct 10, 2024

gdetor commented Oct 10, 2024

tuelwer commented Oct 10, 2024

tuelwer commented Oct 11, 2024

mo-arvan commented Oct 11, 2024

tuelwer commented Oct 14, 2024

gdetor commented Oct 16, 2024

mo-arvan commented Nov 1, 2024

tuelwer commented Dec 20, 2024

rougier commented Sep 2, 2024 •

edited

Loading