Skip to content

Commit

Permalink
update code
Browse files Browse the repository at this point in the history
  • Loading branch information
gasvn committed Apr 1, 2021
0 parents commit c8635b2
Show file tree
Hide file tree
Showing 7 changed files with 1,152 additions and 0 deletions.
42 changes: 42 additions & 0 deletions ImageNet_training/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Image classification reference training scripts

This folder contains reference training scripts for image classification.
They serve as a log of how to train specific models, as provide baseline
training and evaluation scripts to quickly bootstrap research.

Except otherwise noted, all models have been trained on 8x V100 GPUs with
the following parameters:

| Parameter | value |
| ------------------------ | ------ |
| `--batch_size` | `32` |
| `--epochs` | `90` |
| `--lr` | `0.1` |
| `--momentum` | `0.9` |
| `--wd`, `--weight-decay` | `1e-4` |
| `--lr-step-size` | `30` |
| `--lr-gamma` | `0.1` |



### ResNet50
```
python -m torch.distributed.launch --nproc_per_node=4 --use_env train.py\
--model resnet50 --epochs 100
```
### ResNet101
```
python -m torch.distributed.launch --nproc_per_node=4 --use_env train.py\
--model resnet101 --epochs 100
```


## Mixed precision training
Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [NVIDIA Apex extension](https://github.com/NVIDIA/apex).

Mixed precision training makes use of both FP32 and FP16 precisions where appropriate. FP16 operations can leverage the Tensor cores on NVIDIA GPUs (Volta, Turing or newer architectures) for improved throughput, generally without loss in model accuracy. Mixed precision training also often allows larger batch sizes. GPU automatic mixed precision training for Pytorch Vision can be enabled via the flag value `--apex=True`.

```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
--model resnet50 --epochs 100 --apex
```
74 changes: 74 additions & 0 deletions ImageNet_training/rbn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
import torch.nn as nn
import math
import torch
import numpy as np
import torch.nn.functional as F
class RepresentativeBatchNorm2d(nn.BatchNorm2d):
def __init__(self, num_features, eps=1e-5, momentum=0.1, affine=True,
track_running_stats=True):
super(RepresentativeBatchNorm2d, self).__init__(
num_features, eps, momentum, affine, track_running_stats)
self.num_features = num_features
### weights for affine transformation in BatchNorm ###
if self.affine:
self.weight = nn.Parameter(torch.Tensor(1, num_features, 1, 1))
self.bias = nn.Parameter(torch.Tensor(1, num_features, 1, 1))
self.weight.data.fill_(1)
self.bias.data.fill_(0)
else:
self.register_parameter('weight', None)
self.register_parameter('bias', None)

### weights for centering calibration ###
self.center_weight = nn.Parameter(torch.Tensor(1, num_features, 1, 1))
self.center_weight.data.fill_(0)
### weights for scaling calibration ###
self.scale_weight = nn.Parameter(torch.Tensor(1, num_features, 1, 1))
self.scale_bias = nn.Parameter(torch.Tensor(1, num_features, 1, 1))
self.scale_weight.data.fill_(0)
self.scale_bias.data.fill_(1)
### calculate statistics ###
self.stas = nn.AdaptiveAvgPool2d((1,1))

def forward(self, input):
self._check_input_dim(input)

####### centering calibration begin #######
input += self.center_weight.view(1,self.num_features,1,1)*self.stas(input)
####### centering calibration end #######

####### BatchNorm begin #######
if self.momentum is None:
exponential_average_factor = 0.0
else:
exponential_average_factor = self.momentum

if self.training and self.track_running_stats:
if self.num_batches_tracked is not None:
self.num_batches_tracked = self.num_batches_tracked + 1
if self.momentum is None: # use cumulative moving average
exponential_average_factor = 1.0 / float(self.num_batches_tracked)
else:
exponential_average_factor = self.momentum
output = F.batch_norm(
input, self.running_mean, self.running_var, None, None,
self.training or not self.track_running_stats,
exponential_average_factor, self.eps)
####### BatchNorm end #######

####### scaling calibration begin #######
scale_factor = torch.sigmoid(self.scale_weight*self.stas(output)+self.scale_bias)
####### scaling calibration end #######
if self.affine:
return self.weight*scale_factor*output + self.bias
else:
return scale_factor*output

if __name__ == '__main__':
images = torch.rand(1, 256, 224, 224).cuda(0)
rbn_layer = RepresentativeBatchNorm2d(256).cuda(0)
total = sum([param.nelement() for param in rbn_layer.parameters()])
print(' + Number of params: %.4fM' % (total / 1e6))
print(rbn_layer(images).size())
print('Memory useage: %.4fM' % ( torch.cuda.max_memory_allocated() / 1024.0 / 1024.0))

Loading

0 comments on commit c8635b2

Please sign in to comment.