Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set use_cache back to True for HF checkpointer #1488

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

eldarkurtic
Copy link
Contributor

Most HF models have use_cache set to True by default, which is manually changed to False in llm-foundry (most likely due to huggingface/transformers#28056). This PR sets use_cache back to True before saving the model with the HF checkpointer.

This makes it a bit more convenient to use models trained with llm-foundry, without having to manually edit config.json and generation_config.json to set use_cache.

@eldarkurtic eldarkurtic requested a review from a team as a code owner August 27, 2024 08:58
@@ -500,7 +500,12 @@ def tensor_hook(

if dist.get_global_rank() == 0:
log.debug('Saving Hugging Face checkpoint in global rank 0')


if hasattr(original_model.config, 'use_cache'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be set on the original model/config because you might continue training after this function is run. I'd be fine setting it for the new model/config that are getting saved out though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point, thanks!

@eldarkurtic
Copy link
Contributor Author

@dakinggg seems like some tests are failing now, and if I am reading this correctly, it is because MPT models, by default, have use_cache: False (contrary to all other models on HF-hub) and before saving to HF checkpoints we are setting use_cache: True.

What is the reason behind this choice for MPT models? I haven't been able to find another model in HF-hub that disables the usage of cache in config.json.

As for the fix for the tests, I assume we can add a patch which checks if the model is MPT, and if yes, don't set use_cache: True. What do you think?

@dakinggg
Copy link
Collaborator

@eldarkurtic Its fine to have it be True since we explicitly set use_cache=False for training now

@eldarkurtic
Copy link
Contributor Author

@dakinggg okay, got it. So it is safe to leave these tests red?

@dakinggg
Copy link
Collaborator

@eldarkurtic well, no. We can't merge with failing tests, but you can update the test so that it passes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants