Skip to content

Conversation

@DavidBert
Copy link
Contributor

@DavidBert DavidBert commented Oct 9, 2025

This commit adds support for the Photon image generation model:

  • PhotonTransformer2DModel: Core transformer architecture
  • PhotonPipeline: Text-to-image generation pipeline
  • Attention processor updates for Photon-specific attention mechanism
  • Conversion script for loading Photon checkpoints
  • Documentation and tests

Some exemples below with the 512 model fine-tuned on the Alchemist dataset and distilled with PAG

image_10 image_4 image_0 image_1

What does this PR do?

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

print("✓ Created scheduler config")


def download_and_save_vae(vae_type: str, output_path: str):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on this one: I'm saving the VAE weights while they are already available on the Hub (Flux VAE and DC-AE).
Is there a way to avoid storing them and instead look directly for the original ones?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it's okay to keep this as is. This way, everything is under the same model repo.

print(f"✓ Saved VAE to {vae_path}")


def download_and_save_text_encoder(output_path: str):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for the Text Encoder.

print("✓ Created scheduler config")


def download_and_save_vae(vae_type: str, output_path: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it's okay to keep this as is. This way, everything is under the same model repo.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clean PR! I left some initial feedback for you. LMK if that makes sense.

Also, it would be great to see some samples of Photon!

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left a couple more comments. Let's also add the pipeline-level tests.

<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
</div>

Photon is a text-to-image diffusion model using simplified MMDIT architecture with flow matching for efficient high-quality image generation. The model uses T5Gemma as the text encoder and supports either Flux VAE (AutoencoderKL) or DC-AE (AutoencoderDC) for latent compression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cc: @stevhliu for a review on the docs.

return xq_out.reshape(*xq.shape).type_as(xq)


class PhotonAttnProcessor2_0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we write it in a fashion similar to

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second this suggestion - in particular, I think it would be more in line with other diffusers models implementations to reuse the layers defined in Attention, such as to_q/to_k/to_v, etc. instead of defining them in PhotonBlock (e.g. PhotonBlock.img_qkv_proj), and to keep the entire attention implementation in the PhotonAttnProcessor2_0 class.

Attention supports stuff like QK norms and fusing projections, so that could potentially be reused as well. If you need some custom logic not found in Attention, you could potentially add it in there or create a new Attention-style class like Flux does:

class FluxAttention(torch.nn.Module, AttentionModuleMixin):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change and updated both the conversion script and the checkpoints on the hub.

def __call__(
self,
prompt: Union[str, List[str]] = None,
height: Optional[int] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support passing prompt embeddings too in case users want to supply them precomputed:

prompt_embeds: Optional[torch.FloatTensor] = None,

@DavidBert DavidBert force-pushed the photon branch 2 times, most recently from 4aeccfe to ff28f65 Compare October 15, 2025 13:39
@DavidBert DavidBert requested a review from sayakpaul October 15, 2025 13:40
@sayakpaul sayakpaul requested review from dg845 and stevhliu and removed request for sayakpaul October 15, 2025 15:05
@DavidBert DavidBert requested a review from sayakpaul October 15, 2025 15:18
Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs, remember to add it to the toctree as well!

DavidBert and others added 24 commits October 21, 2025 14:58
@DavidBert
Copy link
Contributor Author

@DavidBert I pushed some nit fixes here: 53a2a7a

Feel free to cherry-pick the commit if you'd like. I thought it will be easier for us to proceed to merging the PR faster. Don't worry about the rest of the styling stuff (being fixed in #12522

Thanks @sayakpaul! Sorry for all these back and forth.
I'm usually running make style, make quality, make fix-copies and the test_models_transformer_photon.py and test_pipeline_photon.py tests. Is there something else I should run before pushing?

@sayakpaul
Copy link
Member

No, all good. I will look into the typing thing further. We should get rid of Dict, List, etc. Thanks for your patience, thus far :)

Hopefully the CI passes through 🤞

@DavidBert
Copy link
Contributor Author

Thanks @sayakpaul!
I will make the second PR to rename Photon to PRX right after this one is merged.

@sayakpaul sayakpaul merged commit cefc2cf into huggingface:main Oct 21, 2025
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants