S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

Last update: Dec 24, 2022

Overview

S²-BNN (Self-supervised Binary Neural Networks Using Distillation Loss)

This is the official pytorch implementation of our paper:

"S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration" (CVPR 2021)

by Zhiqiang Shen, Zechun Liu, Jie Qin, Lei Huang, Kwang-Ting Cheng and Marios Savvides.

In this paper, we introduce a simple yet effective self-supervised approach using distillation loss for learning efficient binary neural networks. Our proposed method can outperform the simple contrastive learning baseline (MoCo V2) by an absolute gain of 5.5∼15% on ImageNet.

The student models are not restricted to the binary neural networks, you can replace with any efficient/compact models.

Citation

If you find our code is helpful for your research, please cite:

@InProceedings{Shen_2021_CVPR,
	author    = {Shen, Zhiqiang and Liu, Zechun and Qin, Jie and Huang, Lei and Cheng, Kwang-Ting and Savvides, Marios},
	title     = {S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-Bit Neural Networks via Guided Distribution Calibration},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year      = {2021}

}

Preparation

1. Requirements:

Python
PyTorch
Torchvision

2. Data:

Download ImageNet dataset following https://github.com/pytorch/examples/tree/master/imagenet#requirements.

Training & Testing

To train a model, run the following scripts. All our models are trained with 8 GPUs.

1. Standard Two-Step Training:

Our enhanced MoCo V2:

Step 1:

cd Contrastive_only/step1
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]  --mlp --moco-t 0.2 --aug-plus --cos -j 48

Step 2:

cd Contrastive_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]  --mlp --moco-t 0.2 --aug-plus --cos -j 48  --model-path ../step1/checkpoint_0199.pth.tar

Our MoCo V2 + Distillation Loss:

Download real-valued teacher network here. We use MoCo V2 800-epoch pretrained model, while you can choose other stronger self-supervised models as the teachers.

Step 1:

cd Contrastive+Distillation/step1
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0  --teacher-path ../../moco_v2_800ep_pretrain.pth.tar

Step 2:

cd Contrastive+Distillation/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0  --teacher-path ../../moco_v2_800ep_pretrain.pth.tar --model-path ../step1/checkpoint_0199.pth.tar

Our Distillation Loss Only:

Step 1:

cd Distillation_only/step1
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar

Step 2:

cd Distillation_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar --model-path ../step1/checkpoint_0199.pth.tar

2. Simple One-Step Training (Conventional):

Our enhanced MoCo V2:

cd Contrastive_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48

Our MoCo V2 + Distillation Loss:

cd Contrastive+Distillation/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar

Our Distillation Loss Only:

cd Distillation_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar

You can replace binary neural networks with any kinds of efficient/compact models on one-step training.

3. Testing:

To linearly evaluate a model, run the following script:

python main_lincls.py  --lr 0.1  -j 24  --batch-size 256  --pretrained  /home/szq/projects/s2bnn/checkpoint_0199.pth.tar --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]

Results & Models

We provide pre-trained models with different training strategies, we report in the table #epochs, OPs, Top-1 accuracy on ImageNet validation set:

Models	#Epoch	FLOPs (x10⁸)	OPs (x10⁸)	Top-1 (%)	Trained models
MoCo V2 baseline	200	0.12	0.87	46.9	Download
Our enhanced MoCo V2	200	0.12	0.87	52.5	Download
Our MoCo V2 + Distillation Loss	200	0.12	0.87	56.0	Download
Our Distillation Loss Only	200	0.12	0.87	61.5	Download

Training Logs

Our linear evaluation logs are availabe at here.

Acknowledgement

MoCo V2 (Improved Baselines with Momentum Contrastive Learning)

ReActNet (ReActNet: Towards Precise Binary NeuralNetwork with Generalized Activation Functions)

MEAL V2 (MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks)

Contact

Zhiqiang Shen, CMU (zhiqiangshen0214 at gmail.com)

S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

Related tags

Overview

S2-BNN (Self-supervised Binary Neural Networks Using Distillation Loss)

Citation

Preparation

1. Requirements:

2. Data:

Training & Testing

1. Standard Two-Step Training:

Our enhanced MoCo V2:

Step 1:

Step 2:

Our MoCo V2 + Distillation Loss:

Step 1:

Step 2:

Our Distillation Loss Only:

Step 1:

Step 2:

2. Simple One-Step Training (Conventional):

Our enhanced MoCo V2:

Our MoCo V2 + Distillation Loss:

Our Distillation Loss Only:

3. Testing:

Results & Models

Training Logs

Acknowledgement

Contact

Owner

Zhiqiang Shen

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

Codebase for Inducing Causal Structure for Interpretable Neural Networks

DeRF: Decomposed Radiance Fields

Visualizing lattice vibration information from phonon dispersion to atoms (For GPUMD)

Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ)

AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

YOLOX-CondInst - Implement CondInst which is a instances segmentation method on YOLOX

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

Using pytorch to implement unet network for liver image segmentation.

Data from "HateCheck: Functional Tests for Hate Speech Detection Models" (Röttger et al., ACL 2021)

Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.

GEP (GDB Enhanced Prompt) - a GDB plug-in for GDB command prompt with fzf history search, fish-like autosuggestions, auto-completion with floating window, partial string matching in history, and more!

Tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation

Simple ray intersection library similar to coldet - succedeed by libacc

Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion

Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

Official code repository for "Exploring Neural Models for Query-Focused Summarization"

An executor that performs image segmentation on fashion items

Face Recognition Attendance Project

S²-BNN (Self-supervised Binary Neural Networks Using Distillation Loss)