- Implement the
original
architecture ofBasic Block
&Bottleneck Block
with bothIdentity
andProjection
short-cut connections. - Use my implementation to build a simple ResNet 12 and train the model usign CIFAR-10 dataset
We can define Neural networks as Universal function approximators
F(X) = X and the accuracy increases with increasing the number of layers.
But increasing the number of layers return some problems like vanishing and exploding gradients
and the curse of dimensionality
, also the accuracy will saturate at one point and eventually degrade.
If we have sufficiently deep networks, it may not be able to learn even a simple functions like an identity function.
The idea behind the above block is, instead of hoping each few stacked layers directly fit a desired underlying mapping say (H(x)), we explicitly let these layers fit a residual mapping i.e.. (F(x) = H(x) - x). Thus original mapping (H(x)) becomes (F(x) + x).
Shortcut connections
These connections are those skipping one or more layers. F(x)+x can be understood as feedforward neural networks with “shortcut connections”.
Why deep residual framework?
The idea is motivated by the degradation problem (training error increases as depth increases). Suppose if the added layers can be constructed as identity mappings, a deeper model should have training error no greater than its shallower counterpart.
If identity mappings are optimal, it is easier to make F(x) as 0 than fitting H(x) as x (as suggested by degradation problem).
- 3x3 convolution with padding followed by BatchNorm and ReLU.
- 3x3 convolution with padding followed by BatchNorm.
- 1x1 convolution followed by BatchNorm and ReLU.
- 3x3 convolution with stride followed by BatchNorm and ReLU.
- 1x1 convolution followed by BatchNorm.
The whole block is called one block (layer), which is consists of multiple layers (Conv, BN, ReLU).
Residual blocks building blocks of Resnet

The shortcut connections of a residual neural network can be:
- An identity block, which is employed when the input and output have the same dimensions.
- A Projection block, which is a convolution block, used when the dimensions are different, it offers a channel-wise pooling, often called feature map pooling or a projection layer.