Freezing some transformer layers during training? #6854
Replies: 3 comments 3 replies
-
This is a good idea for an example I think. There are a few ways to do what you want, but probably the best way would be to register your own layer function for the transformer model, that you'll use in the config. Here's the default config for the
The Our first step will be to register our own function and insert it into the config. Make a Python file with the following code, and then pass a path to that file with the # Save this as a Python file, and pass its path to the --code argument.
from spacy.util import registry
from spacy_transformers.layers import TransformerModel
@registry.architectures("our_custom_TransformerModel.v0")
def our_custom_transformer(
name: str,
tokenizer_config: Dict[str, Any]
) -> Model[List[Doc], FullTransformerBatch]:
print("We have control!")
model = TransformerModel(name, tokenizer_config)
print("Such model")
return model And then change your training config to refer to your function:
At this point the code should be doing exactly the same as before, but now we have a place to interject some extra logic. Probably the best option is to set the # Save this as a Python file, and pass its path to the --code argument.
from spacy.util import registry
from spacy_transformers.layers.transformer_model import TransformerModel
from spacy_transformers.layers.transformer_model import set_pytorch_transformer
@registry.architectures("LayerFreezingTransformerModel.v0")
def layer_freezing_transformer(
name: str,
tokenizer_config: Dict[str, Any],
freeze_lowest: int # Example of a setting you might want.
) -> Model[List[Doc], FullTransformerBatch]:
model = TransformerModel(name, tokenizer_config)
model.attrs["freeze_lowest"] = freeze_lowest
model.attrs["set_transformer"] = freeze_layers_and_set_transformer
return model
def freeze_layers_and_set_transformer(model, transformer):
# Do the layer freezing here
somehow_freeze_layers(transformer, model.attrs["freeze_lowest"])
set_pytorch_transformer(model, transformer) If you want more fine-grained control than this, for example access to the transformer on every batch, the best alternative would be to make your own wrapper layer, instead of the generic def set_pytorch_transformer(model, transformer):
if model.attrs["has_transformer"]:
raise ValueError("Cannot set second transformer.")
model.layers.append(
TransformerWrapper(
transformer,
convert_inputs=_convert_transformer_inputs,
convert_outputs=_convert_transformer_outputs,
)
)
model.attrs["has_transformer"] = True
model.set_dim("nO", transformer.config.hidden_size) Where |
Beta Was this translation helpful? Give feedback.
-
I'm going to transfer this to the "new features & project ideas" discussion board, because it's a nice idea to discuss further :-) |
Beta Was this translation helpful? Give feedback.
-
Wait, so this solution is not for the huggingface transformers? |
Beta Was this translation helpful? Give feedback.
-
Hello. I'm using spacy-nightly to train textcat with transformer.
In previous experiments on our dataset using hugging-face transformers, freezing a subset of the pretrained transformer's layers during training increased performance significantly. Is there a way to do this in spacy 3? I can only find in the documentation how to freeze the whole transformer component, but not how to freeze parts of it.
Beta Was this translation helpful? Give feedback.
All reactions