Proposal
Add a TransformerBridge adapter for BD3LM (Kuleshov Group, ICLR 2025), a block discrete-diffusion language model with a single block-size dial that interpolates between autoregressive and full diffusion.
Motivation
BD3LM is the diffusion-LM candidate with a unique property: one model, one block_size knob,block_size=1 is autoregressive, block_size=L is full diffusion, and intermediate values blend the two. This allows researchers to isolate the diffusion effect itself within a single model. It complements the addition of LLaDA (a fixed masked-diffusion model with no AR knob) and rides the rising diffusion-LM interp wave (SAEs + diffusion-time steering). Its checkpoints are tiny (~110–170M), so it is the cheapest novel testbed in this batch of architectures.
Gap scan (2026-06-25): ~3 models, ~29K downloads.
Scope note (higher-effort adapter)
Not a standard AR decoder: it is a DiT-style iterative denoiser with adaLN conditioning (adaln: true, cond_dim) and a dual-view block-diffusion attention mask (causal: false, cross_attn: true), generated by an iterative denoising loop. The architecture class is BD3LM (architectures=["BD3LM"], model_type=bd3lm); it loads via AutoModelForMaskedLM per the config's auto_map, so it's remote-code. The hook surface (timestep conditioning, block mask, denoising steps) differs structurally from AR models. Scope the per-step vs single-pass hook representation early, and verify a single forward before the generation loop. Remote-code loading itself is supported (see openelm.py). Released OWT checkpoints are block_size 4 / 8 / 16 (plus a block_size1024-pretrain); block_size=1 is the conceptual AR limit, not a released checkpoint.
Pitch
Map the DiT-style denoiser blocks and expose residual/attention hooks; represent the block_size dial so researchers can sweep AR↔diffusion on one model.
- Claude Code users can scaffold with
/add-model-support kuleshov-group/bd3lm-owt-block_size4.
- Register at the four sites listed in contributing.md.
- Verify smallest-first:
kuleshov-group/bd3lm-owt-block_size4 (~170M, CPU-runnable).
Additional context
Checklist
Proposal
Add a TransformerBridge adapter for
BD3LM(Kuleshov Group, ICLR 2025), a block discrete-diffusion language model with a single block-size dial that interpolates between autoregressive and full diffusion.Motivation
BD3LM is the diffusion-LM candidate with a unique property: one model, one
block_sizeknob,block_size=1is autoregressive,block_size=Lis full diffusion, and intermediate values blend the two. This allows researchers to isolate the diffusion effect itself within a single model. It complements the addition of LLaDA (a fixed masked-diffusion model with no AR knob) and rides the rising diffusion-LM interp wave (SAEs + diffusion-time steering). Its checkpoints are tiny (~110–170M), so it is the cheapest novel testbed in this batch of architectures.Gap scan (2026-06-25): ~3 models, ~29K downloads.
Scope note (higher-effort adapter)
Not a standard AR decoder: it is a DiT-style iterative denoiser with adaLN conditioning (
adaln: true,cond_dim) and a dual-view block-diffusion attention mask (causal: false,cross_attn: true), generated by an iterative denoising loop. The architecture class isBD3LM(architectures=["BD3LM"],model_type=bd3lm); it loads viaAutoModelForMaskedLMper the config'sauto_map, so it's remote-code. The hook surface (timestep conditioning, block mask, denoising steps) differs structurally from AR models. Scope the per-step vs single-pass hook representation early, and verify a single forward before the generation loop. Remote-code loading itself is supported (see openelm.py). Released OWT checkpoints areblock_size4 / 8 / 16 (plus ablock_size1024-pretrain);block_size=1is the conceptual AR limit, not a released checkpoint.Pitch
Map the DiT-style denoiser blocks and expose residual/attention hooks; represent the
block_sizedial so researchers can sweep AR↔diffusion on one model./add-model-support kuleshov-group/bd3lm-owt-block_size4.kuleshov-group/bd3lm-owt-block_size4(~170M, CPU-runnable).Additional context
hf_scraperarchitecture-gaps pass (2026-06-25).Checklist