Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Last update: Dec 14, 2021

Related tags

Overview

mae-repo

PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https://github.com/lucidrains/vit-pytorch (for MAE architectures) and https://github.com/pengzhiliang/MAE-pytorch (for training loop).

prepare ImageNet1K datasets

To train MAE, one should prepare ImageNet_ILSVRC2012 and place ILSVRC2012_*.tar in the ${datasets_path}. To shorten the overhead of first run, one can manually untar the tarfile into train and val directories, as follow (refered to https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4).

mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
cd ..

mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash

modify configuration file

To separate code and config, we try to split configurations to yaml file, located in configs directory, such as imagenet1k-vit-base.yml. One can modify 'model' setting following MAE and ViT to configure model architecture parameters of ViT-base, large and huge.

One can modify 'optim' for optimizer settings. And modify 'training' and 'data' for training settings. Note that, modify 'training:batch_size' to fit the GPU memory of one GPU card. Total batch_size is equal to batch_size multiplied by number of GPU cards.

train

CUDA_VISIBLE_DEVICES=0,1,2,3,5,6,7 OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 mae_test.py
--datasets_path ${datasets_path}
--config imagenet1k-vit-base.yml
--doc mae-vit-base16-dec8-512

ToDo lists

add pretrain mode
add fine-tunning mode
support mixed precision training
support distributed training
verify the correctness of this re-implementation

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Related tags

Overview

mae-repo

prepare ImageNet1K datasets

modify configuration file

train

ToDo lists

Owner

Peng Qiao

End-To-End Memory Network using Tensorflow

Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Measuring if attention is explanation with ROAR

Omnidirectional camera calibration in python

ICCV2021 - A New Journey from SDRTV to HDRTV.

Sample and Computation Redistribution for Efficient Face Detection

Syllabus del curso IIC2115 - Programación como Herramienta para la Ingeniería 2022/I

An Official Repo of CVPR '20 "MSeg: A Composite Dataset for Multi-Domain Segmentation"

《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》(EMNLP 2020)

A minimalist environment for decision-making in autonomous driving

Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Pytorch implementation of Value Iteration Networks (NIPS 2016 best paper)

Classify music genre from a 10 second sound stream using a Neural Network.

GRF: Learning a General Radiance Field for 3D Representation and Rendering

Reading list for research topics in Masked Image Modeling

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Spearmint Bayesian optimization codebase

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning