Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Last update: Dec 30, 2022

Related tags

Overview

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Abstract

For practical deep neural network design on mobile devices, it is essential to consider the constraints incurred by the computational resources and the inference latency in various applications. Among deep network acceleration related approaches, pruning is a widely adopted practice to balance the computational resource consumption and the accuracy, where unimportant connections can be removed either channel-wisely or randomly with a minimal impact on model accuracy. The channel pruning instantly results in a significant latency reduction, while the random weight pruning is more flexible to balance the latency and accuracy. In this paper, we present a unified framework with Joint Channel pruning and Weight pruning (JCW), and achieves a better Pareto-frontier between the latency and accuracy than previous model compression approaches. To fully optimize the trade-off between the latency and accuracy, we develop a tailored multi-objective evolutionary algorithm in the JCW framework, which enables one single search to obtain the optimal candidate architectures for various deployment requirements. Extensive experiments demonstrate that the JCW achieves a better trade-off between the latency and accuracy against various state-of-the-art pruning methods on the ImageNet classification dataset.

Framework

Evaluation

Resnet18

Method	Latency/ms	Accuracy
Uniform 1x	537	69.8
DMCP	341	69.7
APS	363	70.3
JCW	160	69.2
	194	69.7
	196	69.9
	224	70.2

MobileNetV1

Method	Latency/ms	Accuracy
Uniform 1x	167	70.9
Uniform 0.75x	102	68.4
Uniform 0.5x	53	64.4
AMC	94	70.7
Fast	61	68.4
AutoSlim	99	71.5
AutoSlim	55	67.9
USNet	102	69.5
USNet	53	64.2
JCW	31	69.1
	39	69.9
	43	69.8
	54	70.3
	69	71.4

MobileNetV2

Method	Latency/ms	Accuracy
Uniform 1x	114	71.8
Uniform 0.75x	71	69.8
Uniform 0.5x	41	65.4
APS	110	72.8
APS	64	69.0
DMCP	83	72.4
DMCP	45	67.0
DMCP	43	66.1
Fast	89	72.0
Fast	62	70.2
JCW	30	69.1
	40	69.9
	44	70.8
	59	72.2

Requirements

torch
torchvision
numpy
scipy

Usage

The JCW works in a two-step fashion. i.e. the search step and the training step. The search step seaches for the layer-wise channel numbers and weight sparsity for Pareto-optimal models. The training steps trains the searched models with ADMM. We give a simple example for resnet18.

The search step

Modify the configuration file

First, open the file experiments/res18-search.yaml:
```
vim experiments/res18-search.yaml
```
Go to the 44th line and find the following codes:
```
DATASET:
  data: ImageNet
  root: /path/to/imagenet
  ...
```
and modify the root property of DATASET to the path of ImageNet dataset on your machine.
Apply the search

After modifying the configuration file, you can simply start the search by:
```
python emo_search.py --config experiments/res18-search.yaml | tee experiments/res18-search.log
```
After searching, the search results will be saved in experiments/search.pth

The training step

After searching, we can train the searched models by:

Modify the base configuration file

Open the file experiments/res18-train.yaml:
```
vim experiments/res18-train.yaml
```
Go to the 5th line, find the following codes:
```
root: &root /path/to/imagenet
```
and modify the root property to the path of ImageNet dataset on your machine.
Generate configuration files for training

After modifying the base configuration file, we are ready to generate the configuration files for training. To do that, simply run the following command:
```
python scripts/generate_training_configs.py --base-config experiments/res18-train.yaml --search-result experiments/search.pth --output ./train-configs 
```
After running the above command, the training configuration files will be written into ./train-configs/model-{id}/train.yaml.
Apply the training

After generating the configuration files, simply run the following command to train one certain model:
```
python train.py --config xxxx/xxx/train.yaml | tee xxx/xxx/train.log
```

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Related tags

Overview

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Abstract

Framework

Evaluation

Resnet18

MobileNetV1

MobileNetV2

Requirements

Usage

The search step

The training step

Owner

(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

Code for the paper titled "Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks" (NeurIPS 2021 Spotlight).

FS-Mol: A Few-Shot Learning Dataset of Molecules

DeLag: Detecting Latency Degradation Patterns in Service-based Systems

Get the partition that a file belongs and the percentage of space that consumes

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th place solution

prior-based-losses-for-medical-image-segmentation

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

Jittor implementation of Recursive-NeRF: An Efficient and Dynamically Growing NeRF

PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Hypercomplex Neural Networks with PyTorch

HandTailor: Towards High-Precision Monocular 3D Hand Recovery

An AFL implementation with UnTracer (our coverage-guided tracer)

An NVDA add-on to split screen reader and audio from other programs to different sound channels

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Convolutional Neural Network for 3D meshes in PyTorch

The code of "Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer".

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

The official implementation of the CVPR2021 paper: Decoupled Dynamic Filter Networks