ResNet - Live Coding in PyTorch - Wed Sep 15, 9pm IST

vinayak_nayak · September 15, 2021, 5:37pm

Yes it does! To begin with it is identity but it very much has learnable parameters in it.

The skip connections has learnable params when we’re changing number of channels. Also, there’s pooling if stride is anything other than 1.

vinayak_nayak · September 15, 2021, 5:40pm

This is where aman explains convolutions really nicely!

ramesh · September 15, 2021, 5:48pm

Repeat Blocks - nn.Sequential? combined with nn.ModuleList?

bhutanisanyam1 · September 15, 2021, 5:49pm

The VS Code text size is still a little small for me, might be might resolution though

Edit: Looks perfect now, Thanks!

vinayak_nayak · September 15, 2021, 5:53pm

channels typo in for loop in cell 7

koushik_ghosh · September 15, 2021, 5:54pm

typo in your code *channels

nkirukauzuegbunam · September 15, 2021, 5:56pm

As we build the model, is there a way to display what the connections/network looks like?

vinayak_nayak · September 15, 2021, 5:59pm

channels[stage_idx - 1]?

deep_learner_007 · September 15, 2021, 6:04pm

I wanted to know will it be wise to try and use Numpy and Pandas instead of PyTorch for getting to know the intricacies involved?

allenk · September 15, 2021, 6:08pm

if prev num_chan <> num_chan … then use prev as from size

nkirukauzuegbunam · September 15, 2021, 6:08pm

Would a 1x1 conv layer be useful here as a bridge between 64 and 128,

allenk · September 15, 2021, 6:12pm

cool, you got it working.

vinayak_nayak · September 15, 2021, 6:14pm

@ramesh says:

Let us complete this as HW and post our Colab Notebooks here for ResNet18, 34 and stretch goal of ResNet50?

vinayak_nayak · September 15, 2021, 6:19pm

I loved when you went whoa! I really felt the happiness

saiamrit · September 15, 2021, 6:19pm

I too felt the eureka moment !!

koushik_ghosh · September 15, 2021, 6:29pm

bye everyone! good night!

vinayak_nayak · September 21, 2021, 3:46pm

Hi guys,

I was trying to code a resnet50 implementation in pytorch after our reading group.

I could get the model definition and all correct. However when I do a forward pass with batch size of 2 with my custom resnet it works fine, with a batch size of 16 it gives cuda memory out error.

But when I do the same using torchvision's inbuilt resnet50() model, I don’t get this error and the forward pass happens very properly

I tried to check the model parameters for both my custom model and the resnet50 default model. And they seem to be almost the same.

Any ideas on why this discrepency? That is, my forward pass successfully happens with inbuilt resnet vs custom resnet although both have same number of parameters for the same batch size…

Thanks,
Vinayak.

PS: Code for blocks, for the model and the printed model is attached below.

Block Code:

class BottleneckBlock(nn.Module):
    
    def __init__(self, in_channels, out_channels):
        super().__init__()
        
        if in_channels == 64: out_channels = 256
        if in_channels != out_channels:
            interim_channels = in_channels if in_channels == 64 else in_channels // 2
        else:
            interim_channels = in_channels // 4
        
        self.conv1 = nn.Conv2d(in_channels, interim_channels, kernel_size = 1, bias = False)
        self.bn1   = nn.BatchNorm2d(interim_channels)
        self.act1  = nn.ReLU(inplace = True)
        self.conv2 = nn.Conv2d(interim_channels, interim_channels, kernel_size = 3, padding = 1, bias = False)
        self.bn2   = nn.BatchNorm2d(interim_channels)
        self.act2  = nn.ReLU(inplace = True)
        self.conv3 = nn.Conv2d(interim_channels, out_channels, kernel_size = 1, bias = False)
        self.bn3   = nn.BatchNorm2d(out_channels)
        self.downsample = noop
        self.act3  = nn.ReLU(inplace = True)
        
        if in_channels != out_channels:
            self.downsample = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size = 1, bias = False),
                                            nn.BatchNorm2d(out_channels))
        
            
    def forward(self, x):
        
        original_input = x
        
        # Pass through first conv layer
        x = self.act1(self.bn1(self.conv1(x)))
        # Pass through the second conv layer
        x = self.act2(self.bn2(self.conv2(x)))
        # Pass through the third conv layer
        x = self.bn3(self.conv3(x))
        
        # Skip Connection
        downsample_output = self.downsample(original_input)
        
        # Final output
        return self.act3(x + downsample_output)

This is the architecture code

class custom_resnet_bottleneck(nn.Module):
    
    def __init__(self, block_sizes = [3, 4, 6, 3], layers = [64, 256, 512, 1024], classes = 10):
        super().__init__()
        
        named_blocks = []
        for idx, (bs, ls) in enumerate(zip(block_sizes, layers)):
            items = []
            for i in range(bs):
                if idx == 0:
                    if i == 0:
                        bk = BottleneckBlock(ls, layers[idx + 1])
                    else:
                        bk = BottleneckBlock(layers[idx + 1], layers[idx + 1])
                else:
                    if i == 0:
                        bk = BottleneckBlock(ls, 2 * ls)
                    else:
                        bk = BottleneckBlock(2 * ls, 2 * ls)
                items.append(bk)
            named_blocks.append([f"layer_{idx + 1}", nn.Sequential(*items)])
                        
        # Define the backbone of the architecture
        
        self.conv1 = nn.Conv2d(3, 64, kernel_size = 7, stride = 2, bias = False, padding = 3)
        self.bn1 = nn.BatchNorm2d(64)
        self.act1 = nn.ReLU(inplace = True)
        self.maxpool = nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1)
        for block in named_blocks:
            self.add_module(*block)

                
        # Define the head/classification layer of the architecture
        self.head = nn.Sequential(OrderedDict([
                                            ('avgpool',nn.AdaptiveAvgPool2d(output_size = (1, 1))),
                                            ('flatten', Flatten()),
                                            ('fc',nn.Linear(in_features = layers[-1] * 2, out_features = classes)),
                                            ]))
            
    def forward(self, x):
        
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.act1(x)
        x = self.maxpool(x)
        x = self.layer_1(x)
        x = self.layer_2(x)
        x = self.layer_3(x)
        x = self.layer_4(x)
        op = self.head(x)
        
        
        return op

And this is how the final model looks like

custom_resnet_bottleneck(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (act1): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer_1): Sequential(
    (0): BottleneckBlock(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BottleneckBlock(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
    (2): BottleneckBlock(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
  )
  (layer_2): Sequential(
    (0): BottleneckBlock(
      (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BottleneckBlock(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
    (2): BottleneckBlock(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
    (3): BottleneckBlock(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
  )
  (layer_3): Sequential(
    (0): BottleneckBlock(
      (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BottleneckBlock(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
    (2): BottleneckBlock(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
    (3): BottleneckBlock(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
    (4): BottleneckBlock(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
    (5): BottleneckBlock(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
  )
  (layer_4): Sequential(
    (0): BottleneckBlock(
      (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BottleneckBlock(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
    (2): BottleneckBlock(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act2): ReLU(inplace=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act3): ReLU(inplace=True)
    )
  )
  (head): Sequential(
    (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
    (flatten): Flatten(full=False)
    (fc): Linear(in_features=2048, out_features=10, bias=True)
  )
)

amanarora · September 21, 2021, 4:40pm

Hey @vinayak_nayak - easiest to just share a colab notebook that replicates this error. I’ll take a look - thanks!

amanarora · September 21, 2021, 4:41pm

Our own implementation is using more memory than it should - could be a for loop somewhere, an incorrect if statement or something else really!

vinayak_nayak · September 21, 2021, 4:55pm

I will Aman, thanks a lot!

Topic		Replies	Views
Week 14 Discussion Thread Fastbook Reading Group	25	2110	October 26, 2021
#3 PyTorch Book Thread: Sunday, 12th Sept 8AM PT PyTorch Book Reading Group	38	3190	October 7, 2021
#4 PyTorch Book Thread: Sunday, 19th Sept 8AM PT PyTorch Book Reading Group	29	2734	September 20, 2021
#5 PyTorch Book Thread: Sunday, 26th Sept 8AM PT PyTorch Book Reading Group	32	2304	September 26, 2021
Week 15 Discussion Thread Fastbook Reading Group	29	1670	September 24, 2021

ResNet - Live Coding in PyTorch - Wed Sep 15, 9pm IST

Related topics