about hyperparameter searching #46

anguoyang · 2023-02-08T06:05:34Z

Hi @ConnorBaker thanks for sharing your code.
Regarding the hyperparameters searching, do you intend to search a lightweight architecture? how about the resulted model in size/param/flops? thanks

ConnorBaker · 2023-02-10T19:36:40Z

Hi @anguoyang,

Credit for the code goes to @Algolzw!

I'm fairly new to machine learning and the techniques available for both training and tuning. Regarding hyper parameter searching, I'm using Syne Tune (https://github.com/awslabs/syne-tune) because it supports multi-objective hyper parameter optimization. In particular I wanted to be able to optimize along the Pareto frontier for PSNR, MSSSIM, and LPIPS (and not just one of them).

From some of the reading I've done recently, I see that there's a distinction drawn between hyper parameter optimization and architecture optimization (called neural architecture search, I think?). I haven't messed with much of the latter, but it looks interesting. Towards that end, I've been looking at (but have not used):

In the immediate term, I'm looking into packaging the repo with Nix because I am exhausted by steps I have to take to get the build I want working locally. I don't think I've pushed it yet, but I've been using PyTorch with CUDA 12 and Triton from head, all patched to work with the 4090 I got recently. It's been a huge pain compiling everything from source over and over, so I'd like very much to be able to just use Nix (which I am familiar with) instead of crying over Dockerfiles every day.

Beyond that, I'd like to swap out some of the components for more performant or (potentially accurate) counterparts. For example, I think the following things hold some promise:

https://github.com/microsoft/esvit
https://github.com/dingmyu/davit
https://github.com/NVlabs/GCVit
https://github.com/NVlabs/flip (maybe this would be more helpful with some sort of diffusion-based model)
https://github.com/facebookresearch/xformers
https://github.com/NVlabs/tiny-cuda-nn
https://github.com/NVIDIA/TransformerEngine
https://github.com/NVIDIA/FasterTransformer

I also factored out some code from this repo into separate repos to make it more maintainable:

I'd really like to work some more on mfsr_utils in particular -- I feel like there's a lot of commonalities in SR workflows (multi and single frame) which people end up re-implementing. I'd like to add in some common augmentations (blur kernel, noise, etc.) and support taking patches from larger images (instead of relying on them to be pre-cropped to small sizes).

I've also been thinking about modifying the model so it can work with different sizes of input. In other words, begin training on very small images (perhaps 16x16) and over the course of training, increase to higher resolution images. I'm curious if that would help performance.

After that I'll probably revisit the training/optimization portion.

I hope that answers your question! If you have any references you think would be interesting, I'd love to see them.

I just realized that this doesn't exist outside of my head anywhere so I'm going to keep this issue open so I don't lose track of it.

ConnorBaker self-assigned this Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about hyperparameter searching #46

about hyperparameter searching #46

anguoyang commented Feb 8, 2023

ConnorBaker commented Feb 10, 2023

about hyperparameter searching #46

about hyperparameter searching #46

Comments

anguoyang commented Feb 8, 2023

ConnorBaker commented Feb 10, 2023