Skip to content

[Proposal] Add SmolLM3 TransformerBridge architecture adapter #1351

@RecreationalMath

Description

@RecreationalMath

Proposal

Add a TransformerBridge architecture adapter for SmolLM3 (SmolLM3ForCausalLM, the HuggingFaceTB SmolLM3 family).

Motivation

I have been running some local experiments and wanted to use HuggingFaceTB/SmolLM3-3B. I could not find a bridge adapter for it. It also shows up as unsupported in architecture_gaps.json (relevancy 54.4, 8 models, ~1.1M downloads, tiny-random checkpoints available for CI). Happy to take it on if no one else is mid-flight.

Pitch

Structurally a Llama-family block shape (RMSNorm + GQA + SwiGLU + RoPE, no biases) with tied embeddings. Two architectural quirks worth flagging:

  • NoPE-every-4th-layer, controlled by HF's per-layer config.no_rope_layers.
  • Sliding window, configured but disabled on the released 3B (sliding_window: null).

Both ride existing bridge infrastructure with no new generalized components needed. Happy to share specifics in the PR.

Rough sketch:

  • Adapter modelled on qwen2.py (closest match for the no-bias, tied-embed shape).
  • Registrations in the three usual sites (architecture_adapter_factory.py, supported_architectures/__init__.py, tools/model_registry/__init__.py).
  • Adapter unit test.
  • Verification via transformer_lens.tools.model_registry.verify_models against HuggingFaceTB/SmolLM3-3B plus a tiny-random checkpoint.

If the approach looks right, happy to open the PR shortly.

Checklist

  • I have checked that there is no similar issue in the repo (required)

Metadata

Metadata

Labels

TransformerBridgeBug specific to the new TransformerBridge systemcomplexity-moderateModerately complicated issues for people who have intermediate experience with the codeenhancementNew feature or requesthigh-priorityMaintainers are interested in these issues being solved before othersnew-architectureThis card involves adding a new architecture .

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions