Skip to content

Conversation

@Adamusen
Copy link
Contributor

Set the initial optimizer.max_lr value to be zero for each parameter group. Removed the "baked in" 0.8 optimizer momentum values, it will be initialized with the one provided in the train config file instead.

Fixes #122

Set the initial optimizer.max_lr value to be zero for each parameter group.
Removed the "baked in" 0.8 optimizer momentum values, it will be initialized with the one provided in the train config file instead.
@henrytsui000
Copy link
Member

Hold on, I also do a coarse momentum schedule, but not pushed yet, you may modify it from the code:

def lerp(start: float, end: float, step: Union[int, float], total: int = 1):
    return start + (end - start) * step / total

def create_optimizer(model: YOLO, optim_cfg: OptimizerConfig) -> Optimizer:
    ...

    def next_epoch(self, batch_num, epoch_idx):
        self.min_lr = self.max_lr
        self.max_lr = [param["lr"] for param in self.param_groups]
        # TODO: load momentum from config instead a fix number
        #       0.937: Start Momentum
        #       0.8  : Normal Momemtum
        #       3    : The warm up epoch num
        self.min_mom = lerp(0.937, 0.8, max(epoch_idx, 3), 3)
        self.max_mom = lerp(0.937, 0.8, max(epoch_idx + 1, 3), 3)
        self.batch_num = batch_num
        self.batch_idx = 0

    def next_batch(self):
        self.batch_idx += 1
        lr_dict = dict()
        for lr_idx, param_group in enumerate(self.param_groups):
            min_lr, max_lr = self.min_lr[lr_idx], self.max_lr[lr_idx]
            param_group["lr"] = lerp(min_lr, max_lr, self.batch_idx, self.batch_num)
            param_group["momentum"] = lerp(self.min_mom, self.max_mom, self.batch_idx, self.batch_num)
            lr_dict[f"LR/{lr_idx}"] = param_group["lr"]
        return lr_dict
    ...

@Adamusen
Copy link
Contributor Author

Alright :)

One additional note regarding the learning rate scheduling: With the current implementation the lightning module is unable to restore the learning rate if one tries to continue an interrupted training by providing the checkpoint path for trainer.fit(model, ckpt_path=ckpt_path) in lazy.py :) (otherwise everything else is loaded properly)

@Adamusen
Copy link
Contributor Author

Closing this pull request, as you are working on this part of the code yourself anyway :)

@Adamusen Adamusen closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issues within the learning rate schedule and optimizer initialization

2 participants