Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)

Last update: Jan 04, 2023

Related tags

Overview

Automated Learning Rate Scheduler for Large-Batch Training

The official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML).

Overview

AutoWU is an automated LR scheduler which consists of two phases: warmup and decay. Learning rate (LR) is increased in an exponential rate until the loss starts to increase, and in the decay phase LR is decreased following the pre-specified type of the decay (either cosine or constant-then-cosine, in our experiments).

Transition from the warmup to the decay phase is done automatically by testing whether the minimum of the predicted loss curve is attained in the past or not with high probability, and the prediction is made via Gaussian Process regression.

How to use

Setup

pip install -r requirements.txt

Quick use

You can use AutoWU as other PyTorch schedulers, except that it takes loss as an argument (like ReduceLROnPlateau in PyTorch). The following code snippet demonstrates a typical usage of AutoWU.

from autowu import AutoWU

...

scheduler = AutoWU(optimizer,
                   len(train_loader),  # the number of steps in one epoch 
                   total_epochs,  # total number of epochs
                   immediate_cooldown=True,
                   cooldown_type='cosine',
                   device=device)

...

for _ in range(total_epochs):
    for inputs, targets in train_loader:
        loss = loss_fn(model(inputs), targets)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        scheduler.step(loss)

The default decay phase schedule is ''cosine''. To use constant-then-cosine schedule rather than cosine, set immediate_cooldown=False and set cooldown_fraction to a desired value:

scheduler = AutoWU(optimizer,
                   len(train_loader),  # the number of steps in one epoch 
                   total_epochs,  # total number of epochs
                   immediate_cooldown=False,
                   cooldown_type='cosine',
                   cooldown_fraction=0.2,  # fraction of cosine decay at the end
                   device=device)

Reproduction of results

We provide an exemplar training script train.py which is based on Pytorch Image Models. The script supports training ResNet-50 and EfficientNet-B0 on ImageNet classification under the setting almost identical to the paper. We report the top-1 accuracy of ResNet-50 and EfficientNet-B0 on the validation set trained with batch sizes 4K (4096) and 16K (16384), along with the scores reported in our paper.

ResNet-50	This repo.	Reported (paper)
4K	75.54%	75.70%
16K	74.87%	75.22%

EfficientNet-B0	This repo.	Reported (paper)
4K	75.74%	75.81%
16K	75.66%	75.44%

You can use distributed.launch util to run the script. For instance, in case of ResNet-50 training with batch size 4096, execute the following line with variables set according to your environment:

python -m torch.distributed.launch \
--nproc_per_node=4 \
--nnodes=4 \
--node_rank=$NODE_RANK \
--master_addr=$MASTER_ADDR \
--master_port=$MASTER_PORT \
train.py \
--data-root $DATA_ROOT \
--amp \
--batch-size 256

In addition, add --model efficientnet_b0 argument in case of EfficientNet-B0 training.

Citation

@inproceedings{
    kim2021automated,
    title={Automated Learning Rate Scheduler for Large-batch Training},
    author={Chiheon Kim and Saehoon Kim and Jongmin Kim and Donghoon Lee and Sungwoong Kim},
    booktitle={8th ICML Workshop on Automated Machine Learning (AutoML)},
    year={2021},
    url={https://openreview.net/forum?id=ljIl7KCNYZH}
}

Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)

Related tags

Overview

Automated Learning Rate Scheduler for Large-Batch Training

Overview

How to use

Setup

Quick use

Reproduction of results

Citation

License

Owner

Kakao Brain

MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

Knowledge Management for Humans using Machine Learning & Tags

TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video

Breast Cancer Detection 🔬 ITI "AI_Pro" Graduation Project

M3DSSD: Monocular 3D Single Stage Object Detector

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

A program that can analyze videos according to the weights you select

Self-training with Weak Supervision (NAACL 2021)

learning and feeling SLAM together with hands-on-experiments

Medical Image Segmentation using Squeeze-and-Expansion Transformers

Multi Camera Calibration

Curvlearn, a Tensorflow based non-Euclidean deep learning framework.

An educational resource to help anyone learn deep reinforcement learning.

A vision library for performing sliced inference on large images/small objects

Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers

VOGUE: Try-On by StyleGAN Interpolation Optimization

CLIP + VQGAN / PixelDraw

SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021