Aggragrating Nested Transformer Official Jax Implementation

Last update: Dec 20, 2022

Overview

Aggragrating Nested Transformer Official Jax Implementation

NesT is a simple method, which aggragrates nested local transformers on image blocks. The idea makes vision transformers attain better accuracy, data efficiency, and convergence on the ImageNet benchmark. NesT can be scaled to small datasets to match convnet accuracy.

This is not an officially supported Google product.

Pretrained Models and Results

Model	Accuracy	Checkpoint path
Nest-B	83.8	gs://gresearch/nest-checkpoints/nest-b_imagenet
Nest-S	83.3	gs://gresearch/nest-checkpoints/nest-s_imagenet
Nest-T	81.5	gs://gresearch/nest-checkpoints/nest-t_imagenet

Note: Accuracy is evaluated on the ImageNet2012 validation set.

Tensorbord.dev

See ImageNet training logs at Tensorboard.dev.

Colab

Colab is available for test: https://colab.sandbox.google.com/github/google-research/nested-transformer/blob/main/colab.ipynb

Instruction on Image Classification

Environment setup

virtualenv -p python3 --system-site-packages nestenv
source nestenv/bin/activate

pip install -r requirements.txt

Evaluate on ImageNet

At the first time, download ImageNet following tensorflow_datasets instruction from command lines. Optionally, download all pre-trained checkpoints

bash ./checkpoints/download_checkpoints.sh

Run the evaluation script to evaluate NesT-B.

python main.py --config configs/imagenet_nest.py --config.eval_only=True \
  --config.init_checkpoint="./checkpoints/nest-b_imagenet/ckpt.39" \
  --workdir="./checkpoints/nest-t_imagenet_eval"

Train on ImageNet

The default configuration trains NesT-B on TPUv2 8x8 with per device batch size 16.

python main.py --config configs/imagenet_nest.py --jax_backend_target=<TPU_IP_ADDRESS> --jax_xla_backend="tpu_driver" --workdir="./checkpoints/nest-b_imagenet"

Note: See jax/cloud_tpu_colab for info about TPU_IP_ADDRESS.

Train NesT-T on 8 GPUs.

python main.py --config configs/imagenet_nest_tiny.py --workdir="./checkpoints/nest-t_imagenet_8gpu"

The codebase does not support multi-node GPU training (>8 GPUs). The models reported in our paper is trained using TPU with 1024 total batch size.

Train on CIFAR

# Recommend to train on 2 GPUs. Training NesT-T can use 1 GPU.
CUDA_VISIBLE_DEVICES=0,1 python  main.py --config configs/cifar_nest.py --workdir="./checkpoints/nest_cifar"

Cite

@inproceedings{zhang2021aggregating,
  title={Aggregating Nested Transformers},
  author={Zizhao Zhang and Han Zhang and Long Zhao and Ting Chen and Tomas Pfister},
  booktitle={arXiv preprint arXiv:2105.12723},
  year={2021}
}

Aggragrating Nested Transformer Official Jax Implementation

Related tags

Overview

Aggragrating Nested Transformer Official Jax Implementation

Pretrained Models and Results

Tensorbord.dev

Colab

Instruction on Image Classification

Environment setup

Evaluate on ImageNet

Train on ImageNet

Train NesT-T on 8 GPUs.

Train on CIFAR

Cite

Owner

Google Research

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

AgML is a comprehensive library for agricultural machine learning

Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

MIM: MIM Installs OpenMMLab Packages

Implementation of ECCV20 paper: the devil is in classification: a simple framework for long-tail object detection and instance segmentation

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

Face and Body Tracking for VRM 3D models on the web.

Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

Molecular AutoEncoder in PyTorch

Training data extraction on GPT-2

PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

Autonomous Movement from Simultaneous Localization and Mapping

PyTorch Implementation for Deep Metric Learning Pipelines

Python package to generate image embeddings with CLIP without PyTorch/TensorFlow

An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"