Adaptive Multi-Teacher Multi-level Knowledge Distillation(AMTML-KD)

Paper has been accepted by Neurocomputing 415(2020): 106–113.

Authors: Yuang Liu, Wei Zhang and Jun Wang.

Links: [ pdf ] [ code ]

Requirements

PyTorch >= 1.0.0
Jupyter
visdom

Introduction

Knowledge distillation (KD) is an effective learning paradigm for improving the performance of light-weight student networks by utilizing additional supervision knowledge distilled from teacher networks. Most pioneering studies either learn from only a single teacher in their distillation learning methods, neglecting the potential that a student can learn from multiple teachers simultaneously, or simply treat each teacher to be equally important, unable to reveal the different importance of teachers for specific examples. To bridge this gap, we propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework (AMTML-KD), which consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights which are leveraged for acquiring integrated soft-targets (high-level knowledge) and (ii) enabling the intermediate-level hints (intermediate-level knowledge) to be gathered from multiple teachers by the proposed multi-group hint strategy. As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD. Extensive results on publicly available datasets demonstrate the proposed learning framework ensures student to achieve improved performance than strong competitors.

Citation

@article{LIU2020106,
    title = {Adaptive multi-teacher multi-level knowledge distillation},
    author = {Yuang Liu and Wei Zhang and Jun Wang},
    journal = {Neurocomputing},
    volume = {415},
    pages = {106 -- 113},
    year = {2020},
    issn = {0925 -- 2312},
}

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Related tags

Overview

Adaptive Multi-Teacher Multi-level Knowledge Distillation(AMTML-KD)

Requirements

Introduction

Citation

Owner

Frank Liu

🎯 A comprehensive gradient-free optimization framework written in Python

Simple data balancing baselines for worst-group-accuracy benchmarks.

The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'

This is a project based on retinaface face detection, including ghostnet and mobilenetv3

Create and implement a deep learning library from scratch.

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Simple transformer model for CIFAR10

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Tutoriais publicados nas nossas redes sociais para obtenção de dados, análises simples e outras tarefas relevantes no mercado financeiro.

Grammar Induction using a Template Tree Approach

FS-Mol: A Few-Shot Learning Dataset of Molecules

Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" (NeurIPS'20)

A python interface for training Reinforcement Learning bots to battle on pokemon showdown

Yggdrasil - A simplistic bot designed to streamline your server experience

Fuzzing JavaScript Engines with Aspect-preserving Mutation

This is Unofficial Repo. Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection (CVPR 2021)

Source code for the Paper: CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints}

Implementation for "Seamless Manga Inpainting with Semantics Awareness" (SIGGRAPH 2021 issue)

Tensorflow implementation of Swin Transformer model.

Code for Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions