Learning Features with Parameter-Free Layers (ICLR 2022)

Last update: Dec 07, 2022

Related tags

Deep Learning PfLayer

Overview

Learning Features with Parameter-Free Layers (ICLR 2022)

Dongyoon Han, YoungJoon Yoo, Beomyoung Kim, Byeongho Heo | Paper

NAVER AI Lab, NAVER CLOVA

Updates

02.11.2022 Code has been uploaded
02.06.2022 Initial update

Abstract

Trainable layers such as convolutional building blocks are the standard network design choices by learning parameters to capture the global context through successive spatial operations. When designing an efficient network, trainable layers such as the depthwise convolution is the source of efficiency in the number of parameters and FLOPs, but there was little improvement to the model speed in practice. This paper argues that simple built-in parameter-free operations can be a favorable alternative to the efficient trainable layers replacing spatial operations in a network architecture. We aim to break the stereotype of organizing the spatial operations of building blocks into trainable layers. Extensive experimental analyses based on layer-level studies with fully-trained models and neural architecture searches are provided to investigate whether parameter-free operations such as the max-pool are functional. The studies eventually give us a simple yet effective idea for redesigning network architectures, where the parameter-free operations are heavily used as the main building block without sacrificing the model accuracy as much. Experimental results on the ImageNet dataset demonstrate that the network architectures with parameter-free operations could enjoy the advantages of further efficiency in terms of model speed, the number of the parameters, and FLOPs.

Some Analyses in The Paper

1. Depthwise convolution is replaceble with a parameter-free operation:

2. Parameter-free operations are frequently searched in normal building blocks by NAS:

3. R50-hybrid (with the eff-bottlenecks) yields a localizable features (see the Grad-CAM visualizations):

Our Proposed Models

1. Schematic illustration of our models

Here, we provide example models where the parameter-free operations (i.e., eff-layer) are mainly used;
Parameter-free operations such as the max-pool2d and avg-pool2d can replace the spatial operations (conv and SA).

2. Brief model descriptions

resnet_pf.py: resnet50_max(), resnet50_hybrid(): R50-max, R50-hybrid - model with the efficient bottlenecks

vit_pf.py: vit_s_max() - ViT with the efficient transformers

pit_pf.py: pit_s_max() - PiT with the efficient transformers

Usage

Requirements

pytorch >= 1.6.0
torchvision >= 0.7.0
timm >= 0.3.4
apex == 0.1.0

Pretrained models

Network	Img size	Params. (M)	FLOPs (G)	GPU (ms)	Top-1 (%)	Top-5 (%)
`R50`	224x224	25.6	4.1	8.7	76.2	93.8
`R50-max`	224x224	14.2	2.2	6.8	74.3	92.0
`R50-hybrid`	224x224	17.3	2.6	7.3	77.1	93.1

Network	Img size	Throughputs	Vanilla	+CutMix	+DeiT
`R50`	224x224	962 / 112	76.2	77.6	78.8
`ViT-S-max`	224x224	763 / 96	74.2	77.3	79.8
`PiT-S-max`	224x224	1000 / 92	75.7	78.1	80.1

Model load & evaluation

Example code of loading resnet50_hybrid without timm:

import torch
from resnet_pf import resnet50_hybrid

model = resnet50_hybrid() 
model.load_state_dict(torch.load('./weight/checkpoint.pth'))
print(model(torch.randn(1, 3, 224, 224)))

Example code of loading pit_s_max with timm:

import torch
import timm
import pit_pf
   
model = timm.create_model('pit_s_max', pretrained=False)
model.load_state_dict(torch.load('./weight/checkpoint.pth'))
print(model(torch.randn(1, 3, 224, 224)))

Directly run each model can verify a single iteration of forward and backward of the mode.

Training

Our ResNet-based models can be trained with any PyTorch training codes; we recommend timm. We provide a sample script for training R50_hybrid with the standard 90-epochs training setup:

  python3 -m torch.distributed.launch --nproc_per_node=4 train.py ./ImageNet_dataset/ --model resnet50_hybrid --opt sgd --amp \
  --lr 0.2 --weight-decay 1e-4 --batch-size 256 --sched step --epochs 90 --decay-epochs 30 --warmup-epochs 3 --smoothing 0\

Vision transformers (ViT and PiT) models are also able to be trained with timm, but we recommend the code DeiT to train with. We provide a sample training script with the default training setup in the package:

  python3 -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model vit_s_max --batch-size 256 --data-path ./ImageNet_dataset/

License

Copyright 2022-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

How to cite

@inproceedings{han2022learning,
    title={Learning Features with Parameter-Free Layers},
    author={Dongyoon Han and YoungJoon Yoo and Beomyoung Kim and Byeongho Heo},
    year={2022},
    journal={International Conference on Learning Representations (ICLR)},
}

Learning Features with Parameter-Free Layers (ICLR 2022)

Related tags

Overview

Learning Features with Parameter-Free Layers (ICLR 2022)

Updates

Abstract

Some Analyses in The Paper

1. Depthwise convolution is replaceble with a parameter-free operation:

2. Parameter-free operations are frequently searched in normal building blocks by NAS:

3. R50-hybrid (with the eff-bottlenecks) yields a localizable features (see the Grad-CAM visualizations):

Our Proposed Models

1. Schematic illustration of our models

2. Brief model descriptions

Usage

Requirements

Pretrained models

Model load & evaluation

Training

License

How to cite

Owner

NAVER AI

*ObjDetApp* deploys a pytorch model for object detection

👐OpenHands : Making Sign Language Recognition Accessible (WiP 🚧👷‍♂️🏗)

This repo is a C++ version of yolov5_deepsort_tensorrt. Packing all C++ programs into .so files, using Python script to call C++ programs further.

FaRL for Facial Representation Learning

Adversarial Texture Optimization from RGB-D Scans (CVPR 2020).

Optimizaciones incrementales al problema N-Body con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámbito de HPC.

official implementation for the paper "Simplifying Graph Convolutional Networks"

potpourri3d - An invigorating blend of 3D geometry tools in Python.

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

Automatic detection and classification of Covid severity degree in LUS (lung ultrasound) scans

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

ViSD4SA, a Vietnamese Span Detection for Aspect-based sentiment analysis dataset

PyTorch implementation HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

Deep Learning applied to Integral data analysis

LWCC: A LightWeight Crowd Counting library for Python that includes several pretrained state-of-the-art models.

A framework for annotating 3D meshes using the predictions of a 2D semantic segmentation model.

From this paper "SESNet: A Semantically Enhanced Siamese Network for Remote Sensing Change Detection"

Implementation of parameterized soft-exponential activation function.

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.

ObjDetApp deploys a pytorch model for object detection