Fix a bug in optimizer's mix_lr/max_lr when args.override_opt_param_scheduler==True #1284
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
max_lr
andmin_lr
from optimizer instead of its own attributes to compute the learning rate at a certain step.--override-opt_param_scheduler
is used, user expects to override the learning rate schedule that is stored in the checkpoint. However, the optimizer still loads themax_lr
,min_lr
and the decoupled versions from the checkpoint.--override-opt_param_scheduler
is used, the learning rate in training is computed using the oldmax_lr
and the newinit_lr
, a mixture of old and new setting. This can be confusing for user to figure out the issue.Here I propose an fix that during the
load_checkpoint
, when--override-opt_param_scheduler
is set toTrue
, the functionmegatron.core.optimizer._update_min_and_max_lr_in_param_groups
is invoked to update the optimizer loaded from the checkpoint with the new learning rate boudaries. Thus the training would be carried out using the updated learning rate setting in situations like CPT, etc.