Refactor checkpoint loading for fine-tuning for new NeMo2 workflow #132

jstjohn · 2024-09-04T17:27:30Z

This PR: NVIDIA/NeMo-Run#28 will change how checkpoint IO is handled. Roughly after the change, instead of relying on self.__io__ we will have to use a different mechanism to get at the metadata associated with a particular checkpoint. Now there will be a io.ckpt_context global variable that stores the captured hyper-parameters for a particular config object which we need to modify instead. The following provides some rough psuedo-code for how this might look:

@dataclass
class ModelConfig(TransformerConfig):
  other_ckpt:str = "path/to/checkpoint"
  def configure_model(self, tokenizer):
    other_settings = io.load(self.other_ckpt).model.config
    my_cfg = io.ckpt_context.model.config # pointer to global config that will be saved
    for setting in dir(other_settings):
      #1 update self with settings from other
      setattr(self, setting, getattr(other_settings, setting))
      # 2 update what will be saved (post init) with settings from other
      setattr(my_cfg, setting, getattr(other_settings, setting))
      model = init_model(self)
    load_model_weights(model, self.other_ckpt)
    return model

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor checkpoint loading for fine-tuning for new NeMo2 workflow #132

Refactor checkpoint loading for fine-tuning for new NeMo2 workflow #132

jstjohn commented Sep 4, 2024

Refactor checkpoint loading for fine-tuning for new NeMo2 workflow #132

Refactor checkpoint loading for fine-tuning for new NeMo2 workflow #132

Comments

jstjohn commented Sep 4, 2024