In this example, we provide script and tools to perform reproducible experiments on training neural networks on ImageNet dataset.
Features:
- Distributed training with native automatic mixed precision
- Experiments tracking with ClearML
Model | Training Top-1 Accuracy | Training Top-5 Accuracy | Test Top-1 Accuracy | Test Top-5 Accuracy |
---|---|---|---|---|
ResNet-50 | 78% | 92% | 77% | 94% |
Experiment | Model | Training Top-1 Accuracy | Training Top-5 Accuracy | Test Top-1 Accuracy | Test Top-5 Accuracy | ClearML Link |
---|---|---|---|---|---|---|
configs/???.py |
pip install -r requirements.txt
For docker users, you can use the following images to run the example:
docker pull pytorchignite/vision:latest
and install other requirements as suggested above
Please, export the DATASET_PATH
environment variable for the ImageNet dataset.
export DATASET_PATH=/path/to/imagenet
# e.g. export DATASET_PATH=/data/ where "train", "val", "meta.bin" are located
- Adjust batch size for your GPU type in the configuration file:
configs/baseline_resnet50.py
orconfigs/baseline_resnet50.py
Run the following command:
CUDA_VISIBLE_DEVICES=0 python -u main.py training configs/baseline_resnet50.py
- Adjust total batch size for your GPUs in the configuration file:
configs/baseline_resnet50.py
orconfigs/baseline_resnet50.py
OMP_NUM_THREADS=1 torchrun --nproc_per_node=2 main.py training configs/baseline_resnet50.py
Trainings were done using credits provided by trainml.ai platform.