This post contains all comments and discussions during our Fastbook Week 14 session on ResNets
Resources
Blog posts from last week
- @ravimashru’s post on Super-Convergence
- @VinayakNayak’s post on understanding activations with color dimension
Links from this week
Is it recommended to dedicate some time learning framework or try to find the repositories from papers with code website and use stackoverflow?
(Asking this because i am not good with the coding part of ML)
Also should we follow the pytorch reading group?
A lot of implementation of old papers are in TF, how to go around that?
Wow, great blog posts, both of you. I saw Ravi’s earlier today, but missed Vinayak’s. Thanks Aman for highlighting and thanks to both of you for the high quality content. It is so helpful to get this information distilled down in such an easily digestible way!
Should one try to understand the fundamental papers?
Example scenario - I am trying to replicate a model called C3D which was developed in 2015 however it refers to LeCun’s work (ConvNet). Then should I stick with someone else’s understanding of ConvNet or get to know about it and work with the MNIST dataset
Oh you mean that there is some analogy between TF and PyTorch. So learning PyTorch properly would help breaking down TF implementation.
That’s correct, the concepts between Tensorflow and Pytorch are the same even though the names might be different.
It’s definitely easier for me to read a pytorch implementation of a paper because that’s what I’ve learned on mostly, but if only a TF implementation is available, it is readable. It just takes a little more focus (I would recommend rewriting in PyTorch if you can’t find an implementation)
Using global pooling allows you to use any image size as input. However, during training, is the network going to perform better at a particular image size ?
If you are using pretrained weights, I have found that using that size of images that were trained in the pretraining process works best
Excited to try this, that sounds so cool! The paper Aman referenced:
Is conv-bn-relu a “standard”? I remember reading in ch. 13 that some people experimented with bn after the activation layer but don’t remember reading anything about which is better.
the reason you need the input and output to match is for the identity addition of input?
You would have to use Interpolate to re-line them up (Just a guess, not confident about this answer)
If the number of channels is different, we could also have concatened x
and F_x
instead of using a 1x1
conv and adding them?
class ResBlock(Module):
def __init__(self, ni, nf, stride=1):
self.convs = _conv_block(ni,nf,stride)
self.skip_conv = noop if ni==nf else ConvLayer(ni, nf, 1, act_cls=None)
self.pool = noop if stride==1 else nn.AvgPool2d(stride, ceil_mode=True)
def forward(self, x):
F_x = F.relu(self.convs(x))
skip = self.skip_conv(self.pool(x))
print(f"F(x) shape: {F_x.shape}, skip shape: {skip.shape}")
return F_x + skip
Are we really adding X though if we pass it through this convlayer though? That seems like we are kind of cheating by passing it through a convlayer
Sorry for asking this now (due to some coursework I wasn’t able to follow the fastbook reading group for the last few lectures) I wanted to know how to tap into the community. For any doubts, should I refer to the slack group or any other forum?
Welcome! We’re slowly migrating from the slack group into this discourse forum, so you should post any questions into the Fastbook Reading Group section.
This topic contains all the discussion posts from previous sessions, and a link to a playlist with all the recordings.
I found this is a nice bit of revision before diving deeper. Working through “What is torch.nn really?” - YouTube
I think the best position for the norm is still an active debate.