Freezing some transformer layers during training? #6854

jricheimer · 2021-01-28T21:35:47Z

jricheimer
Jan 28, 2021

Hello. I'm using spacy-nightly to train textcat with transformer.

In previous experiments on our dataset using hugging-face transformers, freezing a subset of the pretrained transformer's layers during training increased performance significantly. Is there a way to do this in spacy 3? I can only find in the documentation how to freeze the whole transformer component, but not how to freeze parts of it.

honnibal · 2021-01-29T00:43:45Z

honnibal
Jan 29, 2021
Maintainer

This is a good idea for an example I think. There are a few ways to do what you want, but probably the best way would be to register your own layer function for the transformer model, that you'll use in the config.

Here's the default config for the Transformer component:

[transformer]
max_batch_items = 4096

[transformer.set_extra_annotations]
@annotation_setters = "spacy-transformers.null_annotation_setter.v1"

[transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-base"
tokenizer_config = {"use_fast": true}

[transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

The transformer.model block is going to be filled by looking up a function in the architectures registry, using the key spacy-transformers.TransformerModel.v1. You can find the function this calls at spacy_transformers.layers.transformer_model.TransformerModel.

Our first step will be to register our own function and insert it into the config. Make a Python file with the following code, and then pass a path to that file with the --code argument of the spacy train script (or otherwise ensure the file is imported so that the decorator runs.

# Save this as a Python file, and pass its path to the --code argument.
from spacy.util import registry
from spacy_transformers.layers import TransformerModel

@registry.architectures("our_custom_TransformerModel.v0")
def our_custom_transformer(
    name: str,
    tokenizer_config: Dict[str, Any]
) -> Model[List[Doc], FullTransformerBatch]:
    print("We have control!")
    model = TransformerModel(name, tokenizer_config)
    print("Such model")
    return model

And then change your training config to refer to your function:

[transformer.model]
@architectures = "our_custom_TransformerModel.v0"
name = "roberta-base"
tokenizer_config = {"use_fast": true}

At this point the code should be doing exactly the same as before, but now we have a place to interject some extra logic. Probably the best option is to set the model.attrs["set_transformer"] attribute to your own function. This function will be called with your model instance and the transformer once the transformer is loaded.

# Save this as a Python file, and pass its path to the --code argument.
from spacy.util import registry
from spacy_transformers.layers.transformer_model import TransformerModel
from spacy_transformers.layers.transformer_model import set_pytorch_transformer


@registry.architectures("LayerFreezingTransformerModel.v0")
def layer_freezing_transformer(
    name: str,
    tokenizer_config: Dict[str, Any],
    freeze_lowest: int # Example of a setting you might want.
) -> Model[List[Doc], FullTransformerBatch]:
    model = TransformerModel(name, tokenizer_config)
    model.attrs["freeze_lowest"] = freeze_lowest
    model.attrs["set_transformer"] = freeze_layers_and_set_transformer
    return model

def freeze_layers_and_set_transformer(model, transformer):
    # Do the layer freezing here
    somehow_freeze_layers(transformer, model.attrs["freeze_lowest"])
    set_pytorch_transformer(model, transformer)

If you want more fine-grained control than this, for example access to the transformer on every batch, the best alternative would be to make your own wrapper layer, instead of the generic PyTorchWrapper. The implementation for the generic pytorch wrapper is a bit complicated, because we try to handle many input types, but if you want one that only works on the transformer specifically, it should be quite simple. Instead of calling set_pytorch_transformer inside the set_transformer callback, you'd have:

def set_pytorch_transformer(model, transformer):
    if model.attrs["has_transformer"]:
        raise ValueError("Cannot set second transformer.")
    model.layers.append(
        TransformerWrapper(
            transformer,
            convert_inputs=_convert_transformer_inputs,
            convert_outputs=_convert_transformer_outputs,
        )
    )
    model.attrs["has_transformer"] = True
    model.set_dim("nO", transformer.config.hidden_size)

Where TransformerWrapper is a function that returns a Thinc model instance that communicates with your transformer via a shim.

2 replies

ic1998 Feb 22, 2023

Do you know how to potentially adapt this for google colab?

svlandeg Feb 22, 2023
Maintainer

Which part specifically do you have problems with in Google Colab?

svlandeg · 2021-01-29T11:26:22Z

svlandeg
Jan 29, 2021
Maintainer

I'm going to transfer this to the "new features & project ideas" discussion board, because it's a nice idea to discuss further :-)
(note to OP: you may get an email that this was closed, but it wasn't, it's just being transfered)

1 reply

jricheimer Jan 29, 2021
Author

Thank you @honnibal, that's very helpful. I'll give this a try.

DAR6969 · 2024-04-16T11:47:01Z

DAR6969
Apr 16, 2024

Wait, so this solution is not for the huggingface transformers?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Freezing some transformer layers during training? #6854

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Freezing some transformer layers during training? #6854

jricheimer Jan 28, 2021

Replies: 3 comments · 3 replies

honnibal Jan 29, 2021 Maintainer

ic1998 Feb 22, 2023

svlandeg Feb 22, 2023 Maintainer

svlandeg Jan 29, 2021 Maintainer

jricheimer Jan 29, 2021 Author

DAR6969 Apr 16, 2024

jricheimer
Jan 28, 2021

Replies: 3 comments 3 replies

honnibal
Jan 29, 2021
Maintainer

svlandeg Feb 22, 2023
Maintainer

svlandeg
Jan 29, 2021
Maintainer

jricheimer Jan 29, 2021
Author

DAR6969
Apr 16, 2024