Instance-based label smoothing for improving deep neural networks generalization and calibration

Last update: Aug 13, 2022

Overview

Instance-based Label Smoothing for Neural Networks

Pytorch Implementation of the algorithm.
This repository includes a new proposed method for instance-based label smoothing in neural networks, where the target probability distribution is not uniformly distributed among incorrect classes. Instead, each incorrect class is going to be assigned a target probability that is proportional to the output score of this particular class relative to all the remaining classes for a network trained with vanilla cross-entropy loss on the hard target labels.

The following figure summarizes the idea of our instance-based label smoothing that aims to keep the information about classes similarity structure while training using label smoothing.

Requirements

Python 3.x
pandas
numpy
pytorch

Usage

Datasets

CIFAR10 / CIFAR100 / FashionMNIST

Files Content

The project have a structure as below:

├── Vanilla-cross-entropy.py
├── Label-smoothing.py
├── Instance-based-smoothing.py
├── Models-evaluation.py
├── Network-distillation.py
├── utils
│   ├── data_loader.py
│   ├── utils.py
│   ├── evaluate.py
│   ├── params.json
├── models
│   ├── resnet.py
│   ├── densenet.py
│   ├── inception.py
│   ├── shallownet.py

Vanilla-cross-entropy.py is the file used for training the networks using cross-entropy without label smoothing.
Label-smoothing.py is the file used for training the networks using cross-entropy with standard label smoothing.
Instance-based-smoothing.py is the file used for training the networks using cross-entropy with instance-based label smoothing.
Models-evaluation.py is the file used for evaluation of the trained networks.
Network-distillation.py is the file used for distillation of trained networks into a shallow convolutional network of 5 layers.
models/ includes all the implementations of the different architectures used in our evaluation like ResNet, DenseNet, Inception-V4. Also, the shallow-cnn student network used in distillation experiments.
utils/ includes all utilities functions required for the different models training and evaluation.

Example

python Instance-based-smoothing.py --dataset cifar10 --model resnet18 --num_classes 10

List of Arguments accepted for Codes of Training and Evaluation of Different Models:

--lr type = float, default = 0.1, help = Starting learning rate (A weight decay of $1e^{-4}$ is used).
--tr_size type = float, default = 0.8, help = Size of training set split out of the whole training set (0.2 for validation).
--batch_size type = int, default = 512, help = Batch size of mini-batch training process.
--epochs type = int, default = 100, help = Number of training epochs.
--estop type = int, default = 10, help = Number of epochs without loss improvement leading to early stopping.
--ece_bins type = int, default = 10, help = Number of bins for expected calibration error calculation.
--dataset, type=str, help=Name of dataset to be used (cifar10/cifar100/fashionmnist).
--num_classes type = int, default = 10, help = Number of classes in the dataset.
--model, type=str, help=Name of the model to be trained. eg: resnet18 / resnet50 / inceptionv4 / densetnet (works for FashionMNIST only).

Results

Results of the comparison of different methods on 3 datasets using 4 different architectures are reported in the following table.
The experiments were repeated 3 times, and average $\pm$ stdev of log loss, expected calibration error (ECE), accuracy, distilled student network accuracy and distilled student log loss metrics are reported.

A t-sne visualization for the logits of 3-different classes in CIFAR-10 can be shown below:

Instance-based label smoothing for improving deep neural networks generalization and calibration

Related tags

Overview

Instance-based Label Smoothing for Neural Networks

Requirements

Usage

Datasets

Files Content

List of Arguments accepted for Codes of Training and Evaluation of Different Models:

Results

Owner

Mohamed Maher

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

Pytorch implementation of the paper SPICE: Semantic Pseudo-labeling for Image Clustering

Implementation of Change-Based Exploration Transfer (C-BET)

Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)

SegTransVAE: Hybrid CNN - Transformer with Regularization for medical image segmentation

Explainer for black box models that predict molecule properties

Unsupervised Discovery of Object Radiance Fields

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Robot Hacking Manual (RHM). From robotics to cybersecurity. Papers, notes and writeups from a journey into robot cybersecurity.

Code Impementation for "Mold into a Graph: Efficient Bayesian Optimization over Mixed Spaces"

BiSeNet based on pytorch

Türkiye Canlı Mobese Görüntülerinde Profesyonel Nesne Takip Sistemi

Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"

ROMP: Monocular, One-stage, Regression of Multiple 3D People, ICCV21

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

This is a code repository for the paper "Graph Auto-Encoders for Financial Clustering".

Blind Video Temporal Consistency via Deep Video Prior

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

Focal and Global Knowledge Distillation for Detectors

This is a Deep Leaning API for classifying emotions from human face and human audios.