#4 PyTorch Book Thread: Sunday, 19th Sept 8AM PT

Note: This is a wiki, please edit it & add resources!

:tv: YouTube Link: PyTorch Book Reading - 4. Train your first CNN using Torch - YouTube

Hi all!
This thread is for discussing, Q&A, and everything else for the #3 meetup of the book reading group.

Last week we trained our first NN using Torch, this week we’ll continue by looking at convolutions and training our First CNN using torch.

:round_pushpin:Link to sign up


<<<< Previous Session Thread


I was looking for writing a blog post. The Github blog post was confusing somehow. But will try to do it this week if I can figure out how to write one.

1 Like
  • Why do we choose activation functions with linearity around 0,1?

  • Why do we provide dimension in Softmax?

  • What does view(3,-1) do?

  • Steps in “Traditional” ML Pipeline to detect birds?

  1. (3,-1) basically flattens the data to forward it into linear layer from the conv layers
1 Like
  • Why do we choose activation functions with linearity around 0,1?>>> Batch normalization

  • Why do we provide dimension in Softmax? >>

  • What does view(3,-1) do? >> -1 will identify other dim itself

  • Steps in “Traditional” ML Pipeline to detect birds?>> Using hand-crafted filters on image to find features and then use SVM like classifier.

  1. Traditional ML pipeline - Gather training data and validation data for bird images with labels > train CNN over training, evaluate over Validation > Use Trained model over test/production
1 Like
  1. So that variance remains close to 1 and mean close to 0 in order to avoid the problem of vanishing and exploding gradient.
  2. So that every slice along dim will sum to 1
  3. It changes the shape to 3 rows and automatically calculating the value of the other dimension. E.g say u have a tensor of shape (3,4) so (2,-1) will change the dimension to (2, 6)
  4. Gather data → Apply transforms/ preprocessing–> apply various filters so as to extract edges and other features–>use a model like SVM —> evaluate
  1. Activation function are linear so it is sensitive for values close to centre. Small changes in the image changes the result by quite a bit. For images we are sure of(that are on the extreme end) some changes in the pixels don’t affect much. But for images we are not sure much of, small changes lead to significant changes in our prediction.

This is my understanding but i am not for sure.

Do we have control over what shape identifier filters are present in a CNN layer?

Yes, we can. There is a parameter to specify kernel shape in Conv2D and other classes as well

1 Like

Luis Serrano’s Youtube channel that Sanyam refers to - https://www.youtube.com/c/LuisSerrano


A color picture would be a combination of RGB values to signify the value of a pixel. I’m having a hard time visualizing how on and off for a pixel in a certain location maps to a RGB value for color. Sorry. I’m a noob at this, but love the classes so far and going to go back and start from chapter 1.

1 Like

The shape identifier as you call it, is not hand-designed or hard-coded by us. We simply throw filters at our input (training) data points, and these filters learn the inherent signals corresponding to a label, i.e. a dog is a dog.

We don’t choose or decide which filter will do what. That is decided by the network during training. Some filters become edge detectors, some become corner detectors, some become associated with colors, and so on. But remember that we cannot control which does which.

It was done in pre-Deep Learning Computer Vision, where feature-extractors such as edge-detectors were hard-coded by humans. Like the Sobel kernel is a good edge detector.

You can certainly change and control the size of a kernel, but you cannot control which filter will do what.

The earlier layers usually become low-level feature extractors (such as corner detection), where the layers toward the end become high-level feature extractors (such as recognizing a face).

During inference, the filters related to the detected features get activated, and the others get turned off.

Because, you can turn things off when needed. And put negative weights when needed (Leaky ReLU).


Thank you helps! Great course so far.

1 Like

got it, i always wondered about this but couldnt find an explanation online. I just feel the cloud infront of my eyes have been finally cleared :slight_smile:


NO. But we can see what filter does what. It completely depends on the weights of the filter what it does. It’s completely random what a filter does.

1 Like

No worries. Being a noob is never a bad thing.

On and off pixels as such only make sense when you have a greyscale bitmap image.

But we usually deal with JPEGs or PNGs which have three layers. The R layer has only the information about red values. No green or blue values are of concern in that layer. The same goes for G and B.

These three layers, superimposed together forms an image.

This is the perspective you are looking for-

Read more here.


Thanks, @tauseef , it means a lot. I am very glad that I was of any help.

You can see the Sobel kernel in the book itself. They explain how it specifically detects edges.