Set use_cache back to True for HF checkpointer #1488

eldarkurtic · 2024-08-27T08:58:15Z

Most HF models have use_cache set to True by default, which is manually changed to False in llm-foundry (most likely due to huggingface/transformers#28056). This PR sets use_cache back to True before saving the model with the HF checkpointer.

This makes it a bit more convenient to use models trained with llm-foundry, without having to manually edit config.json and generation_config.json to set use_cache.

dakinggg · 2024-08-27T16:50:06Z

llmfoundry/callbacks/hf_checkpointer.py

@@ -500,7 +500,12 @@ def tensor_hook(

        if dist.get_global_rank() == 0:
            log.debug('Saving Hugging Face checkpoint in global rank 0')
-
+
+            if hasattr(original_model.config, 'use_cache'):


This should not be set on the original model/config because you might continue training after this function is run. I'd be fine setting it for the new model/config that are getting saved out though.

great point, thanks!

eldarkurtic · 2024-08-28T08:42:38Z

@dakinggg seems like some tests are failing now, and if I am reading this correctly, it is because MPT models, by default, have use_cache: False (contrary to all other models on HF-hub) and before saving to HF checkpoints we are setting use_cache: True.

What is the reason behind this choice for MPT models? I haven't been able to find another model in HF-hub that disables the usage of cache in config.json.

As for the fix for the tests, I assume we can add a patch which checks if the model is MPT, and if yes, don't set use_cache: True. What do you think?

dakinggg · 2024-08-28T16:09:40Z

@eldarkurtic Its fine to have it be True since we explicitly set use_cache=False for training now

eldarkurtic · 2024-08-28T20:03:04Z

@dakinggg okay, got it. So it is safe to leave these tests red?

dakinggg · 2024-08-28T20:05:40Z

@eldarkurtic well, no. We can't merge with failing tests, but you can update the test so that it passes

Set use_cache back to True for HF checkpointer

da31894

eldarkurtic requested a review from a team as a code owner August 27, 2024 08:58

dakinggg reviewed Aug 27, 2024

View reviewed changes

eldarkurtic added 2 commits August 28, 2024 06:43

Set use_cache only for the model being saved

349a19f

Cleanup

65f3de3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set use_cache back to True for HF checkpointer #1488

Set use_cache back to True for HF checkpointer #1488

eldarkurtic commented Aug 27, 2024

dakinggg Aug 27, 2024

eldarkurtic Aug 28, 2024

eldarkurtic commented Aug 28, 2024

dakinggg commented Aug 28, 2024

eldarkurtic commented Aug 28, 2024

dakinggg commented Aug 28, 2024

Set use_cache back to True for HF checkpointer #1488

Are you sure you want to change the base?

Set use_cache back to True for HF checkpointer #1488

Conversation

eldarkurtic commented Aug 27, 2024

dakinggg Aug 27, 2024

Choose a reason for hiding this comment

eldarkurtic Aug 28, 2024

Choose a reason for hiding this comment

eldarkurtic commented Aug 28, 2024

dakinggg commented Aug 28, 2024

eldarkurtic commented Aug 28, 2024

dakinggg commented Aug 28, 2024