Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NOT an issue, but a question :) #79

Open
thusinh1969 opened this issue Aug 20, 2021 · 7 comments
Open

NOT an issue, but a question :) #79

thusinh1969 opened this issue Aug 20, 2021 · 7 comments

Comments

@thusinh1969
Copy link

thusinh1969 commented Aug 20, 2021

Hi,

I read the paper and it very sounds. I have been trying NVIDIA StyleGanV2-ADA for weeks without success. It simply did not converge and generated images were full of unwanted artifacts. My dataset is furniture (living room, bed room etc.) and each dataset has over 60 - 150k images. Yes some has only 10k images but we have not tried them.

Even with 100k of images, StyleGANV2-ADA original version failed us for whatever reasons. We tried R1 all ranges, drop/add augmentation bgcfc etc. Not working. We ran up to 30,000 kimg (35 million images through network), still lots of weird streaks in the images.

We are about to try this repo, and can you advise something along the way:

  1. Is there something special about our dataset that may need special attention parameters? Which parameters are to be taken care of? These images are NOT faces / dog / cat where they cropped / highly focus, our dataset as you may know they are widely different in scenes, color and convergence means minimal artifacts around :)

  2. How long do you think it will take for convergence if at all ? We use 4 GPUs GTX 2080ti in our lab for now hence run batch size of 32 for 256x256 image size.

Would love to keep in touch.

Thanks in advance.
Steve
([email protected])

@zsyzzsoft
Copy link
Collaborator

Your dataset seems large enough so the problem does not sound like a discriminator over fitting issue. Could you share some generated and real images?

@thusinh1969
Copy link
Author

thusinh1969 commented Aug 22, 2021

Here we go at 24,000 kimg (24 million images went through network).

  • Total images 47,000 (256x256) with 19 conditional classes
  • 4xV100 batch 64.
  • RAM/CPU much more than enough
  • Ubuntu 18/Torch 1.8 CUDA 11.1
  • Use paper256, R1 (gamma)=10, mit-han-lab augmentation all 3

REAL:

reals_crop

FAKE

fake_crop

----------- Run command line ---------------
python train.py --outdir="../../results/" --gpus=4 --batch=64 --data="../../images_256_StyleGANV2/" --cond=True --mirror=True --cfg=paper256 --gamma=10 --kimg=50000 --DiffAugment="color,translation,cutout" --resume="../../results/CHECKPOINT/network-snapshot-023587.pkl"

---------- train options json file ------------
{
"num_gpus": 4,
"image_snapshot_ticks": 50,
"network_snapshot_ticks": 50,
"metrics": [
"fid50k_full"
],
"random_seed": 0,
"training_set_kwargs": {
"class_name": "training.dataset.ImageFolderDataset",
"path": "../../images_256_StyleGANV2/",
"use_labels": true,
"max_size": 47049,
"xflip": true,
"resolution": 256
},
"data_loader_kwargs": {
"pin_memory": true,
"num_workers": 3,
"prefetch_factor": 2
},
"G_kwargs": {
"class_name": "training.networks.Generator",
"z_dim": 512,
"w_dim": 512,
"mapping_kwargs": {
"num_layers": 8
},
"synthesis_kwargs": {
"channel_base": 16384,
"channel_max": 512,
"num_fp16_res": 4,
"conv_clamp": 256
}
},
"D_kwargs": {
"class_name": "training.networks.Discriminator",
"block_kwargs": {},
"mapping_kwargs": {},
"epilogue_kwargs": {
"mbstd_group_size": 8
},
"channel_base": 16384,
"channel_max": 512,
"num_fp16_res": 4,
"conv_clamp": 256
},
"G_opt_kwargs": {
"class_name": "torch.optim.Adam",
"lr": 0.0025,
"betas": [
0,
0.99
],
"eps": 1e-08
},
"D_opt_kwargs": {
"class_name": "torch.optim.Adam",
"lr": 0.0025,
"betas": [
0,
0.99
],
"eps": 1e-08
},
"loss_kwargs": {
"class_name": "training.loss.StyleGAN2Loss",
"r1_gamma": 10.0,
"diffaugment": "color,translation,cutout"
},
"total_kimg": 50000,
"batch_size": 64,
"batch_gpu": 16,
"ema_kimg": 20,
"ema_rampup": null,
"resume_pkl": "../../results/CHECKPOINT/network-snapshot-023587.pkl",
"ada_kimg": 100,
"run_dir": "../../results/00000--cond-mirror-paper256-gamma10-kimg50000-batch64-color-translation-cutout-resumecustom"
}

@zsyzzsoft
Copy link
Collaborator

zsyzzsoft commented Aug 23, 2021

I think the generated images do not look very bad :) So maybe it's just because the dataset is quite challenging, and state-of-the-art GAN models are still limited in various aspects like model capacity and training methodology to model a very complex distribution even though the dataset is large enough.

@thusinh1969
Copy link
Author

I think the generated images do not look very bad :) So maybe it's just because the dataset is quite challenging, and state-of-the-art GAN models are still limited in various aspects like model capacity and training methodology to model a very complex distribution even though the dataset is large enough.

It is BAD, man. It is NOT usable.
Steve

@thusinh1969
Copy link
Author

thusinh1969 commented Aug 23, 2021

Getting worsen results...

fakes000000 (3)

I will try https://github.com/l4rz/scaling-up-stylegan2

Reduce lrate, gamma=6, eliminate style mixing... Let see.

Steve

@zsyzzsoft
Copy link
Collaborator

zsyzzsoft commented Aug 27, 2021

So I think this is not an issue of discriminator overfitting that can be resolved by DiffAugment but is limited by the network capacity or current methodology.

@Kitty-sunray
Copy link

A clear example of how people overestimate AI capabilities. They trust advertisement/teaser videos and cherry-picked examples in papers too much :-D
Man, these results are incredible for a dataset of just 0.03M images with super complex relations and perspective. Go build your own GAN if you want more :-D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants