Support modernBERT for encoder-decoder models #35385

Bachstelze · 2024-12-21T12:53:54Z

Feature request

The docs state that the EncoderDecoderModel can be used to initialize a sequence-to-sequence model with any pretrained autoencoding model as the encoder. Though ModernBERT isn't supported:

File "/content/syntax_transformer/data/../models/encoderDecoder.py", line 40, in __init__
    self.model = EncoderDecoderModel.from_encoder_decoder_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py", line 538, in from_encoder_decoder_pretrained
    decoder = AutoModelForCausalLM.from_pretrained(decoder_pretrained_model_name_or_path, **kwargs_decoder)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 567, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.modernbert.configuration_modernbert.ModernBertConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, ElectraConfig, ErnieConfig, FalconConfig, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2Config, GitConfig, GlmConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, JambaConfig, JetMoeConfig, LlamaConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MllamaConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, ZambaConfig.

Motivation

ModernBert has a better performance and a longer context length.

Your contribution

How is it possible to support monderBERT? It isn't that different from other BERT models.

The text was updated successfully, but these errors were encountered:

NielsRogge · 2024-12-21T13:57:02Z

The reason ModernBERT isn't supported yet to be used as decoder is because it does not include a cross-attention module.

When you use the EncoderDecoderModel class and want to initialize the weights of the decoder with those of a pre-trained encoder-only one (like ModernBERT), the modeling_xxx.py file needs to support cross-attention (and causal attention mask). This is supported in modeling_bert.py as can be seen here. But for ModernBERT, explicit support for a config.is_decoder argument (and corresponding implementation) would need to be added.

Bachstelze added the Feature request Request for a new feature label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support modernBERT for encoder-decoder models #35385

Support modernBERT for encoder-decoder models #35385

Bachstelze commented Dec 21, 2024

NielsRogge commented Dec 21, 2024

Support modernBERT for encoder-decoder models #35385

Support modernBERT for encoder-decoder models #35385

Comments

Bachstelze commented Dec 21, 2024

Feature request

Motivation

Your contribution

NielsRogge commented Dec 21, 2024