This post contains all comments and discussions during our FastBook Week 12 session on convolutions.
Resources
Blog posts from last week
- @ravimashru’s posts on building a movie recommender (part 1 | part 2)
- @vrc0503’s notes on collaborative filtering (Week 11)
- @VinayakNayak’s post on building an anime recommendation system
- @kurianbenoy-aot’s post on recommender systems
Links from this week
Kurian’s post for recommender systems
3Blue1Brown Grant Sanderson has a great tutorial about convolution with code (in Julia) in this Youtube video.
For people who are interested.
is positive number have any significance ? like how dark it is ? if bigger number like 500 is darker than 100?
Yes, the higher value one would activate more in the case you mentioned. It would output a higher value. Usually pixel values only go from 0 to 255 for ints or 0 to 1 for floats, but besides that, your concept is correct
For those who wants to reinforce their understanding on how Convolution works, here is a github repo with animation that beautifully visualized convolution operations and transpose convolution operations in CNN.
There is a cap on the RGB values. The range of values will be between 0-255 or 0-1 (based on the scale). The values over that are capped. So, bigger the value is, better it is, till the max cap. After that, the value has no effect. The same applies to min value as well
thanks for the clarification
It should be a single matrix torch.Size([64, 1, 28, 28]) , wondering how it looks like ?
I think he will talk about why we lost the 2x2
If we have a RGB image i.e. input is 3 channels, then if we apply these 4 kernels, will we get 12 channels in the output?
i.e. ip → [64, 3, 28, 28]
& op → [64, 12, 26, 26]
?
On this tensor torch.Size([64, 1, 28, 28]) if we apply 4 filter , it will produce 4 torch.Size([64, 1, 26, 26]) for each filter which will be stacked and make a torch.Size([64, 4, 28, 28]) output. and Size is reduce from 28 to 26 because of applying the filter.
I believe the reason is because our kernels are size 3x3 so when we compute each location, we lose 2 pixels in each direction because it won’t be able to stride past the 28x28 size
(Showing it on 9 wide)
[1,2,3,4,5,6,7,8,9]
[1,2,3] - 1
[2,3,4] - 2
[3,4,5] - 3
[4,5,6] - 4
[5,6,7] - 5
[6,7,8] - 6
[7,8,9] - 7
The edges couldn’t be centered and since our kernel size is 3x3
then, we have to leave a buffer of 1 row per side (top, bottom, left and right). Hence 28 x 28
reduces to 26 x 26
.
can kernels or filters be more than 4 in no?
yes. the number of kernels/filters can and do vary in number. we will see this shortly and also learn how and why we change their number.
I had a misunderstanding for the longest time that the number of filters had something to do with the sizes, but the size is actually determined by kernel size (and a few other arguments) but not the number of filters you use
Do we convert the RGB image to black & white before doing convolution?
Because I didn’t understand otherwise how 3 channels -> 64 channels
. If we do for every input channel, it should be 3 channels -> 3 * 64 = 192 channels
.
yes, you can choose any number of kernel.