Week 13 Discussion Thread

This post contains all comments and discussions during our FastBook Week 13 session on convolutions and CNNs.



Blog posts from last week

Links from this week


Hello Everyone, Good Morning,
This may be a noob question, but Why we need to Train Resnet50 from Scratch, why not import weights from imagenet , like this but model=ResNet50(weights=“imagenet”)??

is conv2 uses receptive field of both conv1 and the one before that for final calculations.

I mean what’s necessity of Training an architecture/model from Scratch for a particular task, why not initialize the model directly?


@durgaamma2005 Convolution in general has the receptive field. It is the idea to shirk the useful information from the entire image to a smaller tensor. This can be doen by having a stride higher than 1 usually.

Usually you would want to use some sort of pretraining if you can but if you are trying a new architecture it may not have pre-trained weights available

Aman’s QLD AI HUB talk What's new in computer vision | July Queensland AI - YouTube


so will it not be sufficient just to do computations from previous conv layer rather than to do for all previous layers

We could get the input for the previous convolution only if we do the previous convolutions. Its sequential.

Also Aman , I also request you to host an session About Data Science /Machine Learning Careers & Resume Guidance to help people getting job in DS/ML field.
Because Its becoming harder to get employed in DS/ML.
Even an entry level job requires 2-3 Years of Experience

Think of receptive field as property of pixels of a layer. All the pixels from prior layers which influence the computation of the value of this pixel are in the receptive field of this pixel. So to compute the receptive field of a pixel which is in the output layer of the 2nd convolution - you first figure out all the pixels from the previous layer which would have impacted the calculation of this pixel. Those pixels in the output of the first layer convolve will in turn would have been calculated from a set of pixels in the first layer. So all of these pixels in the first layer will now belong to the receptive field of a pixel in the output of the 2nd layer convolution. That is what the book tries to show. Hope this makes sense.

How are nn.Conv2d () weights initialised? Are they Gaussian distributed with 0 as mean or random weights between 0 to 1?

Why is the output [1,12,14,14]?

first_cnn = nn.Sequential(*[
    conv(ni, nf) for ni, nf in zip([1,3,7,9], [3,7,9,12])])
x = torch.randn(1, 1, 224, 224)

Are you going forward to the next model layer or the next minibatch?

So, is it ok to remove those zero values from the architecture for inference so that we get learner model which predicts faster…

I think I just need to play around with it myself. I didn’t know that visual was possible, but it seems very helpful :slight_smile:

Here is another excellent blog/resource on BatchNorm: https://towardsdatascience.com/batch-normalization-in-3-levels-of-understanding-14c2da90a338

Could you share some resources on how to do data pre-preprocessing

centre cropping and selecting frames for video data