Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

Last update: Dec 11, 2022

Related tags

Deep Learning ViP

Overview

Visual Parser (ViP)

This is the official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers.

Key Features & TLDR

PyTorch Implementation of the ViP network. Check it out at models/vip.py
A fast and neat implementation of the relative positional encoding proposed in HaloNet, BOTNet and AANet.
A transformer-friendly FLOPS & Param counter that supports FLOPS calculation for einsum and matmul operations.

Prerequisite

Please refer to get_started.md.

Results and Models

All models listed below are evaluated with input size 224x224

Model	Top1 Acc	#params	FLOPS	Download
ViP-Tiny	79.0	12.8M	1.7G	Google Drive
ViP-Small	82.1	32.1M	4.5G	Google Drive
ViP-Medium	83.3	49.6M	8.0G	Coming Soon
ViP-Base	83.6	87.8M	15.0G	Coming Soon

To load the pretrained checkpoint, e.g. ViP-Tiny, simply run:

# first download the checkpoint and name it as vip_t_dict.pth
from models.vip import vip_tiny
model = vip_tiny(pretrained="vip_t_dict.pth")

Evaluation

To evaluate a pre-trained ViP on ImageNet val, run:

python3 main.py <data-root> --model <model-name> -b <batch-size> --eval_checkpoint <path-to-checkpoint>

Training from scratch

To train a ViP on ImageNet from scratch, run:

bash ./distributed_train.sh <job-name> <config-path> <num-gpus>

For example, to train ViP with 8 GPU on a single node, run:

ViP-Tiny:

bash ./distributed_train.sh vip-t-001 configs/vip_t_bs1024.yaml 8

ViP-Small:

bash ./distributed_train.sh vip-s-001 configs/vip_s_bs1024.yaml 8

ViP-Medium:

bash ./distributed_train.sh vip-m-001 configs/vip_m_bs1024.yaml 8

ViP-Base:

bash ./distributed_train.sh vip-b-001 configs/vip_b_bs1024.yaml 8

Profiling the model

To measure the throughput, run:

python3 test_throughput.py <model-name>

For example, if you want to get the test speed of Vip-Tiny on your device, run:

python3 test_throughput.py vip-tiny

To measure the FLOPS and number of parameters, run:

python3 test_flops.py <model-name>

Citing ViP

@article{vip,
  title={Visual Parser: Representing Part-whole Hierarchies with Transformers},
  author={Sun, Shuyang and Yue, Xiaoyu, Bai, Song and Torr, Philip},
  journal={arXiv preprint arXiv:2107.05790},
  year={2021}
}

Contact

If you have any questions, don't hesitate to contact Shuyang (Kevin) Sun. You can easily reach him by sending an email to [email protected].

Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

Related tags

Overview

Visual Parser (ViP)

Key Features & TLDR

Prerequisite

Results and Models

Evaluation

Training from scratch

Profiling the model

Citing ViP

Contact

Owner

Shuyang Sun

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Source code for "Pack Together: Entity and Relation Extraction with Levitated Marker"

A set of tools for converting a darknet dataset to COCO format working with YOLOX

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

A high-level Python library for Quantum Natural Language Processing

Convolutional Neural Network for 3D meshes in PyTorch

Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.

Motion and Shape Capture from Sparse Markers

Chess reinforcement learning by AlphaGo Zero methods.

Neural Point-Based Graphics

PyTorch implementations of Generative Adversarial Networks.

Source code for ZePHyR: Zero-shot Pose Hypothesis Rating @ ICRA 2021

The code release of paper Low-Light Image Enhancement with Normalizing Flow

This is a re-implementation of TransGAN: Two Pure Transformers Can Make One Strong GAN (CVPR 2021) in PyTorch.

A full-fledged version of Pix2Seq