Skip to content

[Proposal] Add BD3LM block-diffusion adapter (BD3LM) #1473

Description

@jlarson4

Proposal

Add a TransformerBridge adapter for BD3LM (Kuleshov Group, ICLR 2025), a block discrete-diffusion language model with a single block-size dial that interpolates between autoregressive and full diffusion.

Motivation

BD3LM is the diffusion-LM candidate with a unique property: one model, one block_size knob,block_size=1 is autoregressive, block_size=L is full diffusion, and intermediate values blend the two. This allows researchers to isolate the diffusion effect itself within a single model. It complements the addition of LLaDA (a fixed masked-diffusion model with no AR knob) and rides the rising diffusion-LM interp wave (SAEs + diffusion-time steering). Its checkpoints are tiny (~110–170M), so it is the cheapest novel testbed in this batch of architectures.

Gap scan (2026-06-25): ~3 models, ~29K downloads.

Scope note (higher-effort adapter)

Not a standard AR decoder: it is a DiT-style iterative denoiser with adaLN conditioning (adaln: true, cond_dim) and a dual-view block-diffusion attention mask (causal: false, cross_attn: true), generated by an iterative denoising loop. The architecture class is BD3LM (architectures=["BD3LM"], model_type=bd3lm); it loads via AutoModelForMaskedLM per the config's auto_map, so it's remote-code. The hook surface (timestep conditioning, block mask, denoising steps) differs structurally from AR models. Scope the per-step vs single-pass hook representation early, and verify a single forward before the generation loop. Remote-code loading itself is supported (see openelm.py). Released OWT checkpoints are block_size 4 / 8 / 16 (plus a block_size1024-pretrain); block_size=1 is the conceptual AR limit, not a released checkpoint.

Pitch

Map the DiT-style denoiser blocks and expose residual/attention hooks; represent the block_size dial so researchers can sweep AR↔diffusion on one model.

  • Claude Code users can scaffold with /add-model-support kuleshov-group/bd3lm-owt-block_size4.
  • Register at the four sites listed in contributing.md.
  • Verify smallest-first: kuleshov-group/bd3lm-owt-block_size4 (~170M, CPU-runnable).

Additional context

Checklist

  • I have checked that there is no similar issue in the repo (required)

Metadata

Metadata

Labels

TransformerBridgeBug specific to the new TransformerBridge systemcomplexity-highVery complicated changes for people to address who are quite familiar with the codehelp wantedExtra attention is needednew-architectureThis card involves adding a new architecture .

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions