DenseNet Paper: [1608.06993] Densely Connected Convolutional Networks
Blog post: DenseNet Architecture Explained with PyTorch Implementation from TorchVision | Committed towards better future
Livestream on YouTube here: W&B Paper Reading Group: DenseNet - YouTube


Yes can hear you very well Aman!

1 Like

one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections.

I couldn’t comprehend the above line from the abstract! Can we discuss it in detail?

1 Like

This is because the first layer has 1 connection, the second layer has 2 connections, and so on… so if we add all these connections, 1 + 2 + 3 + … + L = L(L+1)/2.


yeah got it, like arithmetic series :slight_smile:


We need to keep size intact with padding while doing convolution ?

1 Like

Strided (with a really large stride) 1 x 1 convolution?

This means we’ll never be adding input from first block to last block or more generally, there is no skip connection across blocks but only within the blocks, right?

1 Like

how 32 features are getting added in db1

What is the reason/intuition behind using more layers in the deeper blocks (Dense Block 3 and 4).?

Will this architecture will not overfit as having too many conv densely connected ? and at last one FC too ?

1 Like

I am confused on 1x1 with 128 filters as BottleNeck. Is that same across all the blocks and all the layers in each block?

In the transition block, could we have only the pooling layer and avoid the 1x1 layer ? What is the advantage of the 1x1 layers : where we go from 64x56x56 to 128x56x56

fundamental doubt: how do you differentiate between filters. like each kernel is unique in a stack of kernels in filter. so when out channels depends on no of filters we apply, how exactly we differentiate between filters. since no of kernels will be equal all across filters, is kernel values are arbitrary or random values so that we can get various weight values so that each filter will be different?

Fundamental Question - When we go deeper with layers, the features that get extracted are called low level features or high level features?

1 Like

Thanks @amanarora . that makes sense. trying it out :slight_smile:


That makes sense. But the first layer inside the DenseBlock has only 32 outputs. So the 1x1 that follows it is not a Bottleneck anymore if it increases the feature maps to 128. But the term Bottleneck makes sense in the later layers in the DenseBlock.

1 Like

The ResNet architectures also add skip connections. DenseNet also adds connections from earlier layers to latter layers inside the DenseNet Blocks. How do you form your intuition about how this is helping the overall performance? How do this kind of skip-connection-type connections improve over the performance of vanilla ResNets?

Can you please quickly explain how a 1x1 conv changes the number of channels? I thought it would be the same number… what don’t I get?

Are there any ablation studies done for the DenseNet? Like do we know which of the connections inside the dense block contribute how much to the performance? Are skip connections more valuable than direct connections?

1 Like