Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)

Last update: Dec 08, 2022

Related tags

Overview

MLP-Mixer

Pytorch reimplementation of Google's repository for the MLP-Mixer (Not yet updated on the master branch) that was released with the paper MLP-Mixer: An all-MLP Architecture for Vision by Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy.

In this paper, the authors show a performance close to SotA in an image classification benchmark using MLP(Multi-layer perceptron) without using CNN and Transformer.

MLP-Mixer (Mixer for short) consists of per-patch linear embeddings, Mixer layers, and a classifier head. Mixer layers contain one token-mixing MLP and one channel-mixing MLP, each consisting of two fully-connected layers and a GELU nonlinearity. Other components include: skip-connections, dropout, and linear classifier head.

Usage

1. Download Pre-trained model (Google's Official Checkpoint)

Available models: Mixer-B_16, Mixer-L_16
- imagenet pre-train models
  - Mixer-B_16, Mixer-L_16
- imagenet-21k pre-train models
  - Mixer-B_16, Mixer-L_16

# imagenet pre-train
wget https://storage.googleapis.com/mixer_models/imagenet1k/{MODEL_NAME}.npz

# imagenet-21k pre-train
wget https://storage.googleapis.com/mixer_models/imagenet21k/{MODEL_NAME}.npz

2. Fine-tuning

python3 train.py --name cifar10-100_500 --model_type Mixer-B_16 --pretrained_dir checkpoint/Mixer-B_16.npz

Reproducing Mixer results

upstream	model	dataset	acc(official)
ImageNet	Mixer-B/16	cifar10	96.72
ImageNet	Mixer-L/16	cifar10	96.59
ImageNet-21k	Mixer-B/16	cifar10	96.82
ImageNet-21k	Mixer-L/16	cifar10	96.34

Reference

Google's Vision Transformer and MLP-Mixer

Citations

@article{tolstikhin2021,
  title={MLP-Mixer: An all-MLP Architecture for Vision},
  author={Tolstikhin, Ilya and Houlsby, Neil and Kolesnikov, Alexander and Beyer, Lucas and Zhai, Xiaohua and Unterthiner, Thomas and Yung, Jessica and Keysers, Daniel and Uszkoreit, Jakob and Lucic, Mario and Dosovitskiy, Alexey},
  journal={arXiv preprint arXiv:2105.01601},
  year={2021}
}

Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)

Related tags

Overview

MLP-Mixer

Usage

1. Download Pre-trained model (Google's Official Checkpoint)

2. Fine-tuning

Reproducing Mixer results

Reference

Citations

Owner

Eunkwang Jeon

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021

[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

Display, filter and search log messages in your terminal

Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation

Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Collection of NLP model explanations and accompanying analysis tools

Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks

A simple program for training and testing vit

View model summaries in PyTorch!

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

Wenet STT Python

Code for Understanding Pooling in Graph Neural Networks

Script utilizando OpenCV e modelo Machine Learning para detectar o uso de máscaras.

Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"