ResNets - Deep Residual Learning for Image Recognition

ResNets Paper: [1512.03385] Deep Residual Learning for Image Recognition
Forum: ResNets - Deep Residual Learning for Image Recognition - Paper Reading Group - W&B Community
Rewatch here: W&B Paper Reading Group: ResNets - YouTube


A general question is how does the same network like Resnet do image classification and also object detection and segmentation?

I presume you’re talking about the pretrained resnet on imagenet task being used subsequently for the tasks you mentioned.

The backbone body layers should have learned many generic features corresponding to the 1000 classes i.e. dog’s eyes, cat’s fur etc. which are really good features for many tasks like object detection

So, we can use this backbone of conv layers and change the head to do regression of coordinates in place of classification…

Can the ResNet idea of Skip Connection be applied to Feed Forward Linear Network? Or does that not make much sense to see them in use? Or have we stopped building deep feed forward linear models?

Can we also talk about the Training method used. For example, in the paper they talk about having a warm-up LR of 0.01 and later moving to LR of 0.1. This seems really high LR than what we mostly use. Can we review that section?

Are the identity layers not learnable?

If not, then how can we claim deeper shouldn’t be worse than shallowe counterpart?

If yes, then I understand.

As per my understanding, Identiy layers have no parameters. So there’s nothing to learn. But the Residual path can be learned to be Zeros, making the layer not relevant (Skip).

Identify mapping primarily resolves the Vanishing Gradient problem through Identity connection. So Early layers get the gradients even if the network is really deep solving the primary problem introduced by deep networks.

1 Like

I have tried to summarise most of the points that are shared in the “How to read research papers” video by Professor Andrew Ng provides. Anyone interested can have a look at it here.


Can you please explain with an input example of say 224 by 224 images, how the identity x is added back?

1 Like

Aren’t identity layer and residual path the same? Can you kindly elaborate the difference between the two?

1 Like

As per my understanding. The skip connection learns the identity mapping and anything else that in the main stem of the network is learning(i.e: output of a layer is F(x)+x, so x is what is coming from the identity mapping, the F(x) that is learnt is called the residual thing learnt). Identity is like the major thing that is passed from the input and the F(x) is anything residual or anything additional over the identity that the network learns. So it is basically like we atleast learn an identity mapping in worst case(at a large depth where the gradients are 0 and learning from the main stem is almost 0). Addition to identity we are also learning something more.

1 Like

How to decide whether to use skip connections ? The intuition behind different blocks?

Can we talk about this?

In fastbook, we saw as we add conv layers, we increase the channels when we have higher strides.

Along those lines, in resnet-34 I can see that there’s several conv layers which have same channels in input and output and then they do pooling and increase feature maps.


Is it a rule that if we have stride = 1, we should keep same feature maps across conv layers and when we do pooling we increase feature maps…

Just a side note. Kaiming He is one of my favourite researchers. Most of the papers he is involved in are really great and is used at a massive scale. Few of his contributions -

  • The He initialisation
  • Resnet
  • Faster and Mask RCNN
  • Focal Loss and Pyramid Networks

Could you please again explain what the dashed connection mean?

So essentially, the skip connection is learnable and it tries to learn identity mapping so that most of the input information can flow in to subsequent layers and kind of circumvent the vanishing gradient also.

The residual path will try learning what any plain model is trying to do i.e. features like eyes, fur etc. in imagenet (for cats/dogs etc.)

Is my understanding correct?

Ya there are papers which add one conv layer in the skip connection and keep it minimal but most skip connections are simple identity mapping with no parameters in them and hence there is no learning involved. It’s just a simple path that passes the input as it is. Learning only takes place in the residual branch.

1 Like hope this helps.

1 Like

Skip connection is not learnable. This will be clearer when we code things up.