Our based code is forked from ZenNAS. We modify the code to make it easier to use, and according to paper Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition, we add some features that ZenNAS does not have.
We mainly made the following changes:
- support Horovod, PyTorch distributed training
- code refactoring for evolutionary search, speed up searching
- repair THCudaTensor sizes too large problem when search latency model
Besides, as some functions can not work correctly under distributed training, we provide a distributed version.
We tested the modified code and verified its correctness. The results are as follows:
We used apex with mixed precision to complete the training within 5 days on 8 tesla V100 GPUs, and the results are consistent with the paper.
paper model accuracy | distributed training accuracy | |
---|---|---|
ZenNet-0.1ms | 77.8% | 77.922% |
ZenNet-0.8ms | 83.0% | 83.214% |
Note that the same results can be obtained with Horovod, but we took more time to complete the training. So we recommend using apex for distributed training.
Before the code was released, we reproduced the paper algorithm and searched the model according to the paper's conditions. The following table shows the comparison results between the search model and the paper model.
paper accuracy | searched model accuracy | |
---|---|---|
latency01ms | 77.8% | 78.622% |
latency05ms | 82.7% | 82.752% |
latency12ms | 83.6% | 83.466% |
For proving the effectiveness of the algorithm, we experimented with several different model searches and get the following result. We use a single Tesla V100 GPU to evolve the population 50000 times.
method | model | search time(hours) | model score |
---|---|---|---|
ZenNAS | latency01ms | 98.4274 | 126.038 |
\ | latency05ms | 22.0189 | 243.101 |
\ | latency08ms | 28.5952 | 304.323 |
\ | latency12ms | 44.6237 | 375.027 |
modify-ZenNAS | latency01ms | 64.988 | 134.896 |
\ | latency05ms | 20.9895 | 245.712 |
\ | latency08ms | 25.0358 | 310.629 |
\ | latency12ms | 43.239 | 386.669 |
- PyTorch >= 1.6, Python >= 3.7
- By default, ImageNet dataset is stored under ~/data/imagenet; CIFAR-10/CIFAR-100 is stored under ~/data/pytorch_cifar10 or ~/data/pytorch_cifar100
- Pre-trained parameters are cached under ~/.cache/pytorch/checkpoints/zennet_pretrained
- ptflops
- tensorboard >= 1.15 (optional)
- apex
If you want to evaluate pre-trained models, please go to ZenNAS to download the pre-trained model.
To evaluate the pre-trained model on ImageNet using GPU 0:
python val.py --fp16 --gpu 0 --arch ${zennet_model_name}
where ${zennet_model_name} should be replaced by a valid ZenNet model name. The complete list of model names can be found in the 'Pre-trained Models' section.
To evaluate the pre-trained model on CIFAR-10 or CIFAR-100 using GPU 0:
python val_cifar.py --dataset cifar10 --gpu 0 --arch ${zennet_model_name}
To create a ZenNet in your python code:
gpu=0
model = ZenNet.get_ZenNet(opt.arch, pretrained=True)
torch.cuda.set_device(gpu)
torch.backends.cudnn.benchmark = True
model = model.cuda(gpu)
model = model.half()
model.eval()
We supply apex and Horovod distributed training scripts, you can modify other original scripts based on these scripts. apex script:
scripts/Zen_NAS_ImageNet_latency0.1ms_train_apex.sh
Horovod script:
scripts/Zen_NAS_ImageNet_latency0.1ms_train.sh
If you want to search model, please notice the choices "--fix_initialize" and "--origin".
"--fix_initialize" decides how to initialize population, the algorithm default choice is random initialization.
"--origin" determines how the mutation model is generated.
When specified "--origin", the mutated model will be produced using the original method.
Searching for CIFAR-10/100 models with budget params < 1M, using different zero-shot proxies:
scripts/Flops_NAS_cifar_params1M.sh
scripts/GradNorm_NAS_cifar_params1M.sh
scripts/NASWOT_NAS_cifar_params1M.sh
scripts/Params_NAS_cifar_params1M.sh
scripts/Random_NAS_cifar_params1M.sh
scripts/Syncflow_NAS_cifar_params1M.sh
scripts/TE_NAS_cifar_params1M.sh
scripts/Zen_NAS_cifar_params1M.sh
Searching for ImageNet models, with latency budget on NVIDIA V100 from 0.1 ms/image to 1.2 ms/image at batch size 64 FP16:
scripts/Zen_NAS_ImageNet_latency0.1ms.sh
scripts/Zen_NAS_ImageNet_latency0.2ms.sh
scripts/Zen_NAS_ImageNet_latency0.3ms.sh
scripts/Zen_NAS_ImageNet_latency0.5ms.sh
scripts/Zen_NAS_ImageNet_latency0.8ms.sh
scripts/Zen_NAS_ImageNet_latency1.2ms.sh
Searching for ImageNet models, with FLOPs budget from 400M to 800M:
scripts/Zen_NAS_ImageNet_flops400M.sh
scripts/Zen_NAS_ImageNet_flops600M.sh
scripts/Zen_NAS_ImageNet_flops800M.sh
The masternet definition is stored in "Masternet.py". The masternet takes in a structure string and parses it into a PyTorch nn.Module object. The structure string defines the layer structure which is implemented in "PlainNet/*.py" files. For example, in "PlainNet/SuperResK1KXK1.py", we defined SuperResK1K3K1 block, which consists of multiple layers of ResNet blocks. To define your block, e.g. ABC_Block, first, implement "PlainNet/ABC_Block.py". Then in "PlainNet/__init__.py", after the last line, append the following lines to register the new block definition:
from PlainNet import ABC_Block
_all_netblocks_dict_ = ABC_Block.register_netblocks_dict(_all_netblocks_dict_)
After the above registration call, the PlainNet module can parse your customized block from the structure string.
The search space definitions are stored in SearchSpace/*.py. The important function is
gen_search_space(block_list, block_id)
block_list is a list of super-blocks parsed by the masternet. block_id is the index of the block in block_list which will be replaced later by a mutated block This function must return a list of mutated blocks.
"PlainNet/AABC_Block.py" has defined the candidate blocks, you can directly specify candidate blocks in the search spaces by passing parameters "--search_space_list". So you have two methods to specify search spaces. Taking ResNet-like search space as an example, you can use "--search_space SearchSpace/search_space_XXBL.py" or "--search_space_list PlainNet/SuperResK1KXK1.py PlainNet/SuperResKXKX.py" to specify search space. Both of them are equivalent.
In scripts, when you choose to use the first method to specify search space, you should also add other two parameters "--fix_initialize" and"--origin", so the algorithm will initialize with a fixed model.
The zero-shot proxies are implemented in "ZeroShotProxy/*.py". The evolutionary algorithm is implemented in "evolution_search.py". "analyze_model.py" prints the FLOPs and model size of the given network. "benchmark_network_latency.py" measures the network inference latency. "train_image_classification.py" implements SGD gradient training and "ts_train_image_classification.py" implements teacher-student distillation.
A few files in this repository are modified from the following open-source implementations:
https://github.com/DeepVoltaire/AutoAugment/blob/master/autoaugment.py
https://github.com/VITA-Group/TENAS
https://github.com/SamsungLabs/zero-cost-nas
https://github.com/BayesWatch/nas-without-training
https://github.com/rwightman/gen-efficientnet-pytorch
https://pytorch.org/vision/0.8/_modules/torchvision/models/resnet.html
Copyright 2021 ZTE corporation. All Rights Reserved.