-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REQ] add GigaGAN #111
Comments
Hi! One problem with these existing alternative implementations is that they may perform worse than the original. In fact, it seems that none of these provides pre-trained weights. The second problem is that the GigaGAN model uses text conditioning, which is hard to apply to video in general I think. |
@WolframRhodium |
Thanks for the information. The model is interesting because I need to improve existing vs-mlrt infrastructures to support it. This will take time. |
@WolframRhodium If it's not a problem, could you please take a look at the model structure and the conversion code? For example, the fp16 AuraSR v2 model works perfectly fine in Chainner, exactly as it should. However, in Hybrid with vsmlrt, I get a green image when using DirectML. ONNX mode and TensorRT don't work at all. I'm sending you the ONNX model, the conversion code, and the error log from building the trtexec model. If we could somehow manage to adapt it for vs-mlrt, that would be excellent! `[02/15/2025-20:41:00] [V] [TRT] Searching for input: /upsampler/final_res_block/block2/act/Mul_output_0 [02/15/2025-20:41:00] [W] [TRT] Could not read timing cache from: C:/Users/admin/AppData/Local/Temp\cbc61e.engine.cache. A new timing cache will be generated and written. |
Thanks for your information! In L112 of your script dummy_lowres = torch.randn(1, 3, 240, 320, device=device, dtype=torch.float16) so that it matches (I guess the network requires mod64 input, so |
I somehow managed to convert it to support different inputs, that is, to have dynamic axes. Interestingly, it now works in Chainner with artifacts, while in Hybrid, vsmlrt works correctly in TensorRT mode. It takes a bit longer to generate the model since it's quite large, but the important thing is that it works correctly. The performance isn't great. On a 640x480 input with 24GB of VRAM (RTX 3090), it gets around 2 FPS. You can take a look, and if any further finesse is possible for optimization, feel free to modify it. Then, perhaps, you could add it to the vsmlrt project. Although the model is primarily intended for photos without artifacts and noise. I just wanted to point out that the original author's aura-sr project, when you run it, downloads and uses the v1 model, which is much worse. If you want to convert it locally, replace the v1 model weights with the v2 weights from their Hugging Face page in the conda cache directory. |
Thanks for the information. (It is actually faster than I expected.) |
Hi there, thanks for your work !
It would be great to have GigaGan upscaling too: https://mingukkang.github.io/GigaGAN/
Some - maybe useful - implementations:
Hope that inspires !
note: unfortunally VideoGigaGAN - by Adobe Research - sources are not (yet ?) availabe...
The text was updated successfully, but these errors were encountered: