Training Very Deep Neural Networks Without Skip-Connections

Overview

DiracNets

v2 update (January 2018):

The code was updated for DiracNets-v2 in which we removed NCReLU by adding per-channel a and b multipliers without weight decay. This allowed us to significantly simplify the network, which is now folds into a simple chain of convolution-ReLU layers, like VGG. On ImageNet DiracNet-18 and DiracNet-34 closely match corresponding ResNet with the same number of parameters.

See v1 branch for DiracNet-v1.


PyTorch code and models for DiracNets: Training Very Deep Neural Networks Without Skip-Connections

https://arxiv.org/abs/1706.00388

Networks with skip-connections like ResNet show excellent performance in image recognition benchmarks, but do not benefit from increased depth, we are thus still interested in learning actually deep representations, and the benefits they could bring. We propose a simple weight parameterization, which improves training of deep plain (without skip-connections) networks, and allows training plain networks with hundreds of layers. Accuracy of our proposed DiracNets is close to Wide ResNet (although DiracNets need more parameters to achieve it), and we are able to match ResNet-1000 accuracy with plain DiracNet with only 28 layers. Also, the proposed Dirac weight parameterization can be folded into one filter for inference, leading to easily interpretable VGG-like network.

DiracNets on ImageNet:

TL;DR

In a nutshell, Dirac parameterization is a sum of filters and scaled Dirac delta function:

conv2d(x, alpha * delta + W)

Here is simplified PyTorch-like pseudocode for the function we use to train plain DiracNets (with weight normalization):

def dirac_conv2d(input, W, alpha, beta)
    return F.conv2d(input, alpha * dirac(W) + beta * normalize(W))

where alpha and beta are per-channel scaling multipliers, and normalize does l_2 normalization over each feature plane.

Code

Code structure:

├── README.md # this file
├── diracconv.py # modular DiracConv definitions
├── test.py # unit tests
├── diracnet-export.ipynb # ImageNet pretrained models
├── diracnet.py # functional model definitions
└── train.py # CIFAR and ImageNet training code

Requirements

First install PyTorch, then install torchnet:

pip install git+https://github.com/pytorch/[email protected]

Install other Python packages:

pip install -r requirements.txt

To train DiracNet-34-2 on CIFAR do:

python train.py --save ./logs/diracnets_$RANDOM$RANDOM --depth 34 --width 2

To train DiracNet-18 on ImageNet do:

python train.py --dataroot ~/ILSVRC2012/ --dataset ImageNet --depth 18 --save ./logs/diracnet_$RANDOM$RANDOM \
                --batchSize 256 --epoch_step [30,60,90] --epochs 100 --weightDecay 0.0001 --lr_decay_ratio 0.1

nn.Module code

We provide DiracConv1d, DiracConv2d, DiracConv3d, which work like nn.Conv1d, nn.Conv2d, nn.Conv3d, but have Dirac-parametrization inside (our training code doesn't use these modules though).

Pretrained models

We fold batch normalization and Dirac parameterization into F.conv2d weight and bias tensors for simplicity. Resulting models are as simple as VGG or AlexNet, having only nonlinearity+conv2d as a basic block.

See diracnets.ipynb for functional and modular model definitions.

There is also folded DiracNet definition in diracnet.py, which uses code from PyTorch model_zoo and downloads pretrained model from Amazon S3:

from diracnet import diracnet18
model = diracnet18(pretrained=True)

Printout of the model above:

DiracNet(
  (features): Sequential(
    (conv): Conv2d (3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
    (max_pool0): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1), ceil_mode=False)
    (group0.block0.relu): ReLU()
    (group0.block0.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group0.block1.relu): ReLU()
    (group0.block1.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group0.block2.relu): ReLU()
    (group0.block2.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group0.block3.relu): ReLU()
    (group0.block3.conv): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (max_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
    (group1.block0.relu): ReLU()
    (group1.block0.conv): Conv2d (64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group1.block1.relu): ReLU()
    (group1.block1.conv): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group1.block2.relu): ReLU()
    (group1.block2.conv): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group1.block3.relu): ReLU()
    (group1.block3.conv): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (max_pool2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
    (group2.block0.relu): ReLU()
    (group2.block0.conv): Conv2d (128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group2.block1.relu): ReLU()
    (group2.block1.conv): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group2.block2.relu): ReLU()
    (group2.block2.conv): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group2.block3.relu): ReLU()
    (group2.block3.conv): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (max_pool3): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
    (group3.block0.relu): ReLU()
    (group3.block0.conv): Conv2d (256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group3.block1.relu): ReLU()
    (group3.block1.conv): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group3.block2.relu): ReLU()
    (group3.block2.conv): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (group3.block3.relu): ReLU()
    (group3.block3.conv): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (last_relu): ReLU()
    (avg_pool): AvgPool2d(kernel_size=7, stride=7, padding=0, ceil_mode=False, count_include_pad=True)
  )
  (fc): Linear(in_features=512, out_features=1000)
)

The models were trained with OpenCV, so you need to use it too to reproduce stated accuracy.

Pretrained weights for DiracNet-18 and DiracNet-34:
https://s3.amazonaws.com/modelzoo-networks/diracnet18v2folded-a2174e15.pth
https://s3.amazonaws.com/modelzoo-networks/diracnet34v2folded-dfb15d34.pth

Pretrained weights for the original (not folded) model, functional definition only:
https://s3.amazonaws.com/modelzoo-networks/diracnet18-v2_checkpoint.pth
https://s3.amazonaws.com/modelzoo-networks/diracnet34-v2_checkpoint.pth

We plan to add more pretrained models later.

Bibtex

@inproceedings{Zagoruyko2017diracnets,
    author = {Sergey Zagoruyko and Nikos Komodakis},
    title = {DiracNets: Training Very Deep Neural Networks Without Skip-Connections},
    url = {https://arxiv.org/abs/1706.00388},
    year = {2017}}
retweet 4 satoshi ⚡️

rt4sat retweet 4 satoshi This bot is the codebase for https://twitter.com/rt4sat please feel free to create an issue if you saw any bugs basically thi

6 Sep 30, 2022
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Introduction This repository is for X-Linear Attention Networks for Image Captioning (CVPR 2020). The original paper can be found here. Please cite wi

JDAI-CV 240 Dec 17, 2022
A repo for Causal Imitation Learning under Temporally Correlated Noise

CausIL A repo for Causal Imitation Learning under Temporally Correlated Noise. Running Experiments To re-train an expert, run: python experts/train_ex

Gokul Swamy 5 Nov 01, 2022
SafePicking: Learning Safe Object Extraction via Object-Level Mapping, ICRA 2022

SafePicking Learning Safe Object Extraction via Object-Level Mapping Kentaro Wad

Kentaro Wada 49 Oct 24, 2022
"Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices This repository contains the official PyTorch implemen

Yandex Research 21 Oct 18, 2022
Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation Introduction This is a PyTorch

XMed-Lab 30 Sep 23, 2022
Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation

UniFuse (RAL+ICRA2021) Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation, arXiv, Demo Preparation I

Alibaba 47 Dec 26, 2022
Real life contra a deep learning project built using mediapipe and openc

real-life-contra Description A python script that translates the body movement into in game control. Welcome to all new real life contra a deep learni

Programminghut 7 Jan 26, 2022
Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks

MGANs Training & Testing code (torch), pre-trained models and supplementary materials for "Precomputed Real-Time Texture Synthesis with Markovian Gene

290 Nov 15, 2022
Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Matthias Wright 169 Dec 26, 2022
Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals"

The Temporal Robustness of Stochastic Signals Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals" Case stud

0 Oct 28, 2021
This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Self-Supervised Learning with Vision Transformers By Zhenda Xie*, Yutong Lin*, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao and Han Hu This repo is the

Swin Transformer 529 Jan 02, 2023
My implementation of Image Inpainting - A deep learning Inpainting model

Image Inpainting What is Image Inpainting Image inpainting is a restorative process that allows for the fixing or removal of unwanted parts within ima

Joshua V Evans 1 Dec 12, 2021
SimpleDepthEstimation - An unified codebase for NN-based monocular depth estimation methods

SimpleDepthEstimation Introduction This is an unified codebase for NN-based monocular depth estimation methods, the framework is based on detectron2 (

8 Dec 13, 2022
Image restoration with neural networks but without learning.

Warning! The optimization may not converge on some GPUs. We've personally experienced issues on Tesla V100 and P40 GPUs. When running the code, make s

Dmitry Ulyanov 7.4k Jan 01, 2023
Implementation of Pix2Seq in PyTorch

pix2seq-pytorch Implementation of Pix2Seq paper Different from the paper image input size 1280 bin size 1280 LambdaLR scheduler used instead of Linear

Tony Shin 9 Dec 15, 2022
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

What's New Below we share, in reverse chronological order, the updates and new releases in VISSL. All VISSL releases are available here. [Oct 2021]: V

Meta Research 2.9k Jan 07, 2023
Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

AutoViz and Auto_ViML 397 Dec 30, 2022
Cooperative Driving Dataset: a dataset for multi-agent driving scenarios

Cooperative Driving Dataset (CODD) The Cooperative Driving dataset is a synthetic dataset generated using CARLA that contains lidar data from multiple

Eduardo Henrique Arnold 124 Dec 28, 2022