Add Photon model and pipeline support #12456

DavidBert · 2025-10-09T13:21:05Z

This commit adds support for the Photon image generation model:

PhotonTransformer2DModel: Core transformer architecture
PhotonPipeline: Text-to-image generation pipeline
Attention processor updates for Photon-specific attention mechanism
Conversion script for loading Photon checkpoints
Documentation and tests

Some exemples below with the 512 model fine-tuned on the Alchemist dataset and distilled with PAG

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

DavidBert · 2025-10-09T13:21:46Z

scripts/convert_photon_to_diffusers.py

+    print("✓ Created scheduler config")
+
+
+def download_and_save_vae(vae_type: str, output_path: str):


I'm not sure on this one: I'm saving the VAE weights while they are already available on the Hub (Flux VAE and DC-AE).
Is there a way to avoid storing them and instead look directly for the original ones?

For now, it's okay to keep this as is. This way, everything is under the same model repo.

DavidBert · 2025-10-09T13:22:22Z

scripts/convert_photon_to_diffusers.py

+    print(f"✓ Saved VAE to {vae_path}")
+
+
+def download_and_save_text_encoder(output_path: str):


Same here for the Text Encoder.

sayakpaul · 2025-10-09T13:40:52Z

scripts/convert_photon_to_diffusers.py

+    print("✓ Created scheduler config")
+
+
+def download_and_save_vae(vae_type: str, output_path: str):


For now, it's okay to keep this as is. This way, everything is under the same model repo.

src/diffusers/pipelines/photon/pipeline_output.py

src/diffusers/models/attention_processor.py

src/diffusers/models/transformers/transformer_photon.py

src/diffusers/pipelines/photon/pipeline_photon.py

sayakpaul

Thanks for the clean PR! I left some initial feedback for you. LMK if that makes sense.

Also, it would be great to see some samples of Photon!

sayakpaul

Thanks! Left a couple more comments. Let's also add the pipeline-level tests.

docs/source/en/api/pipelines/photon.md

sayakpaul · 2025-10-13T10:59:38Z

docs/source/en/api/pipelines/photon.md

+  <img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
+</div>
+
+Photon is a text-to-image diffusion model using simplified MMDIT architecture with flow matching for efficient high-quality image generation. The model uses T5Gemma as the text encoder and supports either Flux VAE (AutoencoderKL) or DC-AE (AutoencoderDC) for latent compression.


Cc: @stevhliu for a review on the docs.

sayakpaul · 2025-10-13T11:00:59Z

src/diffusers/models/transformers/transformer_photon.py

+    return xq_out.reshape(*xq.shape).type_as(xq)
+
+
+class PhotonAttnProcessor2_0:


Could we write it in a fashion similar to

diffusers/src/diffusers/models/transformers/transformer_flux.py

Line 75 in 8abc7ae

class FluxAttnProcessor:

?

I second this suggestion - in particular, I think it would be more in line with other diffusers models implementations to reuse the layers defined in Attention, such as to_q/to_k/to_v, etc. instead of defining them in PhotonBlock (e.g. PhotonBlock.img_qkv_proj), and to keep the entire attention implementation in the PhotonAttnProcessor2_0 class.

Attention supports stuff like QK norms and fusing projections, so that could potentially be reused as well. If you need some custom logic not found in Attention, you could potentially add it in there or create a new Attention-style class like Flux does:

diffusers/src/diffusers/models/transformers/transformer_flux.py

Line 275 in 8abc7ae

class FluxAttention(torch.nn.Module, AttentionModuleMixin):

I made the change and updated both the conversion script and the checkpoints on the hub.

src/diffusers/models/transformers/transformer_photon.py

src/diffusers/pipelines/photon/pipeline_photon.py

sayakpaul · 2025-10-13T11:10:00Z

src/diffusers/pipelines/photon/pipeline_photon.py

+    def __call__(
+        self,
+        prompt: Union[str, List[str]] = None,
+        height: Optional[int] = None,


We support passing prompt embeddings too in case users want to supply them precomputed:

diffusers/src/diffusers/pipelines/flux/pipeline_flux.py

Line 669 in 8abc7ae

prompt_embeds: Optional[torch.FloatTensor] = None,

src/diffusers/pipelines/photon/pipeline_photon.py

stevhliu

Thanks for the docs, remember to add it to the toctree as well!

docs/source/en/api/pipelines/photon.md

src/diffusers/models/transformers/transformer_photon.py

src/diffusers/pipelines/photon/pipeline_photon.py

Co-authored-by: Steven Liu <[email protected]>

Co-authored-by: dg845 <[email protected]>

DavidBert · 2025-10-21T15:05:24Z

@DavidBert I pushed some nit fixes here: 53a2a7a

Feel free to cherry-pick the commit if you'd like. I thought it will be easier for us to proceed to merging the PR faster. Don't worry about the rest of the styling stuff (being fixed in #12522

Thanks @sayakpaul! Sorry for all these back and forth.
I'm usually running make style, make quality, make fix-copies and the test_models_transformer_photon.py and test_pipeline_photon.py tests. Is there something else I should run before pushing?

sayakpaul · 2025-10-21T15:09:17Z

No, all good. I will look into the typing thing further. We should get rid of Dict, List, etc. Thanks for your patience, thus far :)

Hopefully the CI passes through 🤞

DavidBert · 2025-10-21T15:13:47Z

Thanks @sayakpaul!
I will make the second PR to rename Photon to PRX right after this one is merged.

DavidBert commented Oct 9, 2025

View reviewed changes