CosineAnnealingWithWarmup

Formulation

The learning rate is annealed using a cosine schedule over the course of learning of n_total total steps with an initial warmup period of n_warmup steps. Hence, the learning rate at step i is computed as:

Learning rate will be changed as:

Usage

# optimizer, warmup_epochs, warmup_lr, num_epochs, base_lr, final_lr, iter_per_epoch
lr_scheduler = LR_Scheduler(
        optimizer,
        args.warmup_epochs, args.warmup_lr*args.batch_size/256, 
        args.epochs, args.lr*args.batch_size/256, args.final_lr*args.batch_size/256, 
        len(train_loader),
    )

for data in range(train_loader):
  optimizer.zero_grad()
  output = model(data) 
  loss = lossfunc(output,gt)
  loss.backward()
  optimizer.step()
  lr_scheduler.step()

In CV domain [1,2], in order to automatically adapt different batch size you can use a learning rate of lr×BatchSize/256 (linear scaling [4])(we can use larger learning rate while adopting larger batch size, especially, when you use LARS optimizer[3]). Of course, you can modify it according to your specific requirements.

Reference

[1] Jean-Bastien Grill, Florian Strub, Florent Altch´e, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Do- ersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Moham- mad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, R´emi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning. arXiv:2006.07733v1, 2020.
[2] Chen X, He K. Exploring simple siamese representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 15750-15758.
[3] Yang You, Igor Gitman, and Boris Ginsburg. Large batch training of convolutional networks. arXiv:1708.03888, 2017.
[4] Priya Goyal, Piotr Doll´ar, Ross Girshick, Pieter Noord- huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv:1706.02677, 2017.
[5] https://github.com/PatrickHua/SimSiam?utm_source=catalyzex.com

Cosine Annealing With Warmup

Related tags

Overview

CosineAnnealingWithWarmup

Formulation

Usage

Reference

Owner

zhuyun

Some toy examples of score matching algorithms written in PyTorch

Piotr - IoT firmware emulation instrumentation for training and research

FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective

Secure Distributed Training at Scale

This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization

[WACV 2022] Contextual Gradient Scaling for Few-Shot Learning

An unsupervised learning framework for depth and ego-motion estimation from monocular videos

Official implementation of Deep Burst Super-Resolution

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

Code for Universal Semi-Supervised Semantic Segmentation models paper accepted in ICCV 2019

Code for Transformer Hawkes Process, ICML 2020.

Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

Deep Reinforcement Learning with pytorch & visdom

Code for Robust Contrastive Learning against Noisy Views

Gated-Shape CNN for Semantic Segmentation (ICCV 2019)

McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

BC3407-Group-5-Project - BC3407 Group Project With Python

FastFace: Lightweight Face Detection Framework

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

Action Recognition for Self-Driving Cars