-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Re] A Simple Framework for Contrastive Learning of Visual Representations #76
Comments
Thanks for your submission and sorry for the delay. We'll assign an editor soon. |
@gdetor @benoit-girard @koustuvsinha Can any of you edit this submission? |
I can do it! |
Good news: @charlypg has accepted to review this paper and its companion! |
Hello everybody. |
@schmidDan would you like to review this paper? And possibly (or alternatively) its companion paper #77 ? |
@charlypg do you have an idea when you could be able to deliver your review? |
@pena-rodrigo would you like to review this paper? And possibly (or alternatively) its companion paper #77 ? |
@bagustris would you like to review this paper? And possibly (or alternatively) its companion paper #77 ? |
@birdortyedi would you like to review this paper? And possibly (or alternatively) its companion paper #77 ? |
Hi @benoit-girard. Unfortunately, I am currently not available - and I am afraid I also would not have quite the compute needed to run the code of this paper ;-) |
Hello everybody. I am really sorry for the delay. Good :
Problems :
|
Dear Reviewer (@charlypg), Thank you very much for your insightful feedback. I will do my best to ensure that I provide the minimal configuration required to run the code on a single (non-JeanZay) GPU machine as soon as possible. However, I would like to highlight a challenge: currently, I do not have access to a machine with these specifications. My resources are limited to Jean Zay and a CPU-only laptop, which may complicate the development and testing of the configuration (hopefully, this will not be the case for a long time). Regarding the "Error tracker: world_size missing argument for tracker" issue, it is my bad (and it is now fixed). This error was indeed a typo on my part, coming from recent code updates related to the warning mentioned right after in your review. Thus, for this warning "A reduction issue may have occurred (abs(50016.0 - 1563.0*1) >= 1)," this problem is attributed to an unresolved issue within PyTorch's distributed operations that can lead to illogical reduction, leading to erroneous results (for further details, please refer to: https://discuss.pytorch.org/t/distributed-all-reduce-returns-strange-results/89248). Unfortunately, if this warning is triggered, it indicates that the results of the current epoch (often the final one) are unreliable. The recommended approach in this case is to restart the experiment from the previous checkpoint. Regarding the top-5 accuracy metric, it should be automatically calculated and available through TensorBoard. Could you please clarify if you encountered any difficulties in accessing these results? Best regards, |
Dear @ADevillers , Thank you for your response. Thank you in advance, |
Dear @charlypg, To clarify this part of the checkpointing strategy, this involves alternating saves between "odd" and "even" checkpoints at the end of each respective epoch. This trick ensures that if a run fails during an odd-numbered epoch, we have the state from the preceding epoch in the "even" checkpoint, and vice versa. Please feel free to reach out if you have any further questions. Best regards, |
@charlypg : thanks a lot for the review. |
Thanks a lot for your answer. |
@ReScience/reviewers I am looking for a reviewer with expertise in machine learning to review this submission and possibly (or alternatively) its companion paper #77 |
Dear @ADevillers , Thank you for your answer. I have a question about the training. Once the job corresponding to "run_simclr_imagenet.slurm" has successfully ended, I only obtain one checkpoint of the form "expe_[job_id]_[epoch_number].pt". Best regards, |
Dear @charlypg , Yes, the script itself remains unchanged; the only variation is in the checkpoint used. Initially, no checkpoint is provided for the first execution. Then, I use the last checkpoint from the preceding job. This checkpoint contains all pertinent data, including the current epoch, scheduler, optimizer, and model state, allowing the training to resume from where it was interrupted. Note that you should not modify the other hyperparameters while doing so, as this may lead to unexpected behaviors. Best regards, |
Dear @ADevillers , I am sorry for my late response. I could reproduce top-1 results on Jean Zay. So the reproduction seems convincing to me. However, I cannot find the top-5 results. I saw there is a folder "runs" but much of my evaluation results have not been stored in it. Best regards, |
Dear @charlypg, Your runs should normally be stored in the "runs" folder under a format readable by tensorboard and contains all the curves (including Top-5 acc). Note that, when starting from a checkpoint, the data will append to the file corresponding to the run of the checkpoint. Therefore, a run on ImageNet, even if it requires 6 to 7 restarts from a checkpoint, will only produce one file (that will contain everything). To find out where the issue could be, can you please answer the following questions:
Best, |
@benoit-girard Gentle reminder |
@benoit-girard Any update on the second review ? |
Original article: T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. “A simple framework for contrastive learning of visual repre-
sentations.” In: International conference on machine learning. PMLR, 2020, pp. 1597–1607
PDF URL: https://github.com/ADevillers/SimCLR/blob/main/report.pdf
Metadata URL: https://github.com/ADevillers/SimCLR/blob/main/report.metadata.tex
Code URL: https://github.com/ADevillers/SimCLR/tree/main
Scientific domain: Representation Learning
Programming language: Python
Suggested editor: @rougier
The text was updated successfully, but these errors were encountered: