Accelerate Neural Net Training by Progressively Freezing Layers

Last update: Jun 19, 2022

Overview

FreezeOut

A simple technique to accelerate neural net training by progressively freezing layers.

This repository contains code for the extended abstract "FreezeOut."

FreezeOut directly accelerates training by annealing layer-wise learning rates to zero on a set schedule, and excluding layers from the backward pass once their learning rate bottoms out.

I had this idea while replying to a reddit comment at 4AM. I threw it in an experiment, and it just worked out of the box (with linear scaling and t_0=0.5), so I went on a 96-hour SCIENCE binge, and now, here we are.

The exact speedup you get depends on how much error you can tolerate--higher speedups appear to come at the cost of an increase in error, but speedups below 20% should be within a 3% relative error envelope, and speedups around 10% seem to incur no error cost for Scaled Cubic and Unscaled Linear strategies.

Installation

To run this script, you will need PyTorch and a CUDA-capable GPU. If you wish to run it on CPU, just remove all the .cuda() calls.

Running

To run with default parameters, simply call

python train.py

This will by default download CIFAR-100, split it into train, valid, and test sets, then train a k=12 L=76 DenseNet-BC using SGD with Nesterov Momentum.

This script supports command line arguments for a variety of parameters, with the FreezeOut specific parameters being:

how_scale selects which annealing strategy to use, among linear, squared, and cubic. Cubic by default.
scale_lr determines whether to scale initial learning rates based on t_i. True by default.
t_0 is a float between 0 and 1 that decides how far into training to freeze the first layer. 0.8 (pre-cubed) by default.
const_time is an experimental setting that increases the number of epochs based on the estimated speedup, in order to match the total training time against a non-FreezeOut baseline. I have not validated if this is worthwhile or not.

You can also set the name of the weights and the metrics log, which model to use, how many epochs to train for, etc.

If you want to calculate an estimated speedup for a given strategy and t_0 value, use the calc_speedup() function in utils.py.

Notes

If you know how to implement this in a static-graph framework (specifically TensorFlow or Caffe2), shoot me an email! It's really easy to do with dynamic graphs, but I believe it to be possible with some simple conditionals in a static graph.

There's (at least) one typo in the paper where it defines the learning rate schedule, there should be a 1/2 in front of alpha.

Acknowledgments

DenseNet code stolen in a daring midnight heist from Brandon Amos: https://github.com/bamos/densenet.pytorch
Training and Progress code acquired in a drunken game of SpearPong with Jan Schlüter: https://github.com/Lasagne/Recipes/tree/master/papers/densenet
Metrics Logging code extracted from ancient diary of Daniel Maturana: https://github.com/dimatura/voxnet
WideResNet code summoned using an incantation from Xternalz: https://github.com/xternalz/WideResNet-pytorch

Accelerate Neural Net Training by Progressively Freezing Layers

Related tags

Overview

FreezeOut

Installation

Running

Notes

Acknowledgments

Owner

Andy Brock

High performance distributed framework for training deep learning recommendation models based on PyTorch.

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Code from PropMix, accepted at BMVC'21

Put blind watermark into a text with python

Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

PyTorch implementation of Constrained Policy Optimization

Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric

Learning Dense Representations of Phrases at Scale (Lee et al., 2020)

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

利用python脚本实现微信、支付宝账单的合并，并保存到excel文件实现自动记账，可查看可视化图表。

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Existing Literature about Machine Unlearning

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

A Fast Sequence Transducer Implementation with PyTorch Bindings

General Multi-label Image Classification with Transformers

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Accelerate Neural Net Training by Progressively Freezing Layers

Related tags

Overview

FreezeOut

Installation

Running

Notes

Acknowledgments

Owner

Andy Brock

High performance distributed framework for training deep learning recommendation models based on PyTorch.

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Code from PropMix, accepted at BMVC'21

Put blind watermark into a text with python

Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

PyTorch implementation of Constrained Policy Optimization

Fast EMD for Python: a wrapper for Pele and Werman's C++ implementation of the Earth Mover's Distance metric

Learning Dense Representations of Phrases at Scale (Lee et al., 2020)

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

利用python脚本实现微信、支付宝账单的合并，并保存到excel文件实现自动记账，可查看可视化图表。

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Existing Literature about Machine Unlearning

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

A Fast Sequence Transducer Implementation with PyTorch Bindings

General Multi-label Image Classification with Transformers

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.