Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

Related tags

Deep LearningViP
Overview

Visual Parser (ViP)

This is the official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers.

Visual Parser

Key Features & TLDR

  1. PyTorch Implementation of the ViP network. Check it out at models/vip.py

  2. A fast and neat implementation of the relative positional encoding proposed in HaloNet, BOTNet and AANet.

  3. A transformer-friendly FLOPS & Param counter that supports FLOPS calculation for einsum and matmul operations.

Prerequisite

Please refer to get_started.md.

Results and Models

All models listed below are evaluated with input size 224x224

Model Top1 Acc #params FLOPS Download
ViP-Tiny 79.0 12.8M 1.7G Google Drive
ViP-Small 82.1 32.1M 4.5G Google Drive
ViP-Medium 83.3 49.6M 8.0G Coming Soon
ViP-Base 83.6 87.8M 15.0G Coming Soon

To load the pretrained checkpoint, e.g. ViP-Tiny, simply run:

# first download the checkpoint and name it as vip_t_dict.pth
from models.vip import vip_tiny
model = vip_tiny(pretrained="vip_t_dict.pth")

Evaluation

To evaluate a pre-trained ViP on ImageNet val, run:

python3 main.py <data-root> --model <model-name> -b <batch-size> --eval_checkpoint <path-to-checkpoint>

Training from scratch

To train a ViP on ImageNet from scratch, run:

bash ./distributed_train.sh <job-name> <config-path> <num-gpus>

For example, to train ViP with 8 GPU on a single node, run:

ViP-Tiny:

bash ./distributed_train.sh vip-t-001 configs/vip_t_bs1024.yaml 8

ViP-Small:

bash ./distributed_train.sh vip-s-001 configs/vip_s_bs1024.yaml 8

ViP-Medium:

bash ./distributed_train.sh vip-m-001 configs/vip_m_bs1024.yaml 8

ViP-Base:

bash ./distributed_train.sh vip-b-001 configs/vip_b_bs1024.yaml 8

Profiling the model

To measure the throughput, run:

python3 test_throughput.py <model-name>

For example, if you want to get the test speed of Vip-Tiny on your device, run:

python3 test_throughput.py vip-tiny

To measure the FLOPS and number of parameters, run:

python3 test_flops.py <model-name>

Citing ViP

@article{vip,
  title={Visual Parser: Representing Part-whole Hierarchies with Transformers},
  author={Sun, Shuyang and Yue, Xiaoyu, Bai, Song and Torr, Philip},
  journal={arXiv preprint arXiv:2107.05790},
  year={2021}
}

Contact

If you have any questions, don't hesitate to contact Shuyang (Kevin) Sun. You can easily reach him by sending an email to [email protected].

Owner
Shuyang Sun
DPhil (PhD) student at Oxford
Shuyang Sun
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and ap

3.4k Jan 04, 2023
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

El Bruno 3 Mar 30, 2022
Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

Motionformer This is an official pytorch implementation of paper Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers. In this rep

Facebook Research 192 Dec 23, 2022
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
Source code for "Pack Together: Entity and Relation Extraction with Levitated Marker"

PL-Marker Source code for Pack Together: Entity and Relation Extraction with Levitated Marker. Quick links Overview Setup Install Dependencies Data Pr

THUNLP 173 Dec 30, 2022
A set of tools for converting a darknet dataset to COCO format working with YOLOX

darknet格式数据→COCO darknet训练数据目录结构(详情参见dataset/darknet): darknet ├── class.names ├── gen_config.data ├── gen_train.txt ├── gen_valid.txt └── images

RapidAI-NG 148 Jan 03, 2023
Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Vansh Wassan 15 Jun 17, 2021
A high-level Python library for Quantum Natural Language Processing

lambeq About lambeq is a toolkit for quantum natural language processing (QNLP). Documentation: https://cqcl.github.io/lambeq/ User support: lambeq-su

Cambridge Quantum 315 Jan 01, 2023
Convolutional Neural Network for 3D meshes in PyTorch

MeshCNN in PyTorch SIGGRAPH 2019 [Paper] [Project Page] MeshCNN is a general-purpose deep neural network for 3D triangular meshes, which can be used f

Rana Hanocka 1.4k Jan 04, 2023
Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

CCOP Code of our paper Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning Requirement Install OpenSelfSup Install Detectron2

Chenhongyi Yang 21 Dec 13, 2022
This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

LEAP Lab 2 Sep 15, 2022
TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.

TensorFlow GNN This is an early (alpha) release to get community feedback. It's under active development and we may break API compatibility in the fut

889 Dec 30, 2022
Motion and Shape Capture from Sparse Markers

MoSh++ This repository contains the official chumpy implementation of mocap body solver used for AMASS: AMASS: Archive of Motion Capture as Surface Sh

Nima Ghorbani 135 Dec 23, 2022
Chess reinforcement learning by AlphaGo Zero methods.

About Chess reinforcement learning by AlphaGo Zero methods. This project is based on these main resources: DeepMind's Oct 19th publication: Mastering

Samuel 2k Dec 29, 2022
Neural Point-Based Graphics

Neural Point-Based Graphics Project   Video   Paper Neural Point-Based Graphics Kara-Ali Aliev1 Artem Sevastopolsky1,2 Maria Kolos1,2 Dmitry Ulyanov3

Ali Aliev 252 Dec 13, 2022
PyTorch implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

Erik Linder-Norén 13.4k Jan 08, 2023
Source code for ZePHyR: Zero-shot Pose Hypothesis Rating @ ICRA 2021

ZePHyR: Zero-shot Pose Hypothesis Rating ZePHyR is a zero-shot 6D object pose estimation pipeline. The core is a learned scoring function that compare

R-Pad - Robots Perceiving and Doing 18 Aug 22, 2022
The code release of paper Low-Light Image Enhancement with Normalizing Flow

[AAAI 2022] Low-Light Image Enhancement with Normalizing Flow Paper | Project Page Low-Light Image Enhancement with Normalizing Flow Yufei Wang, Renji

Yufei Wang 176 Jan 06, 2023
This is a re-implementation of TransGAN: Two Pure Transformers Can Make One Strong GAN (CVPR 2021) in PyTorch.

TransGAN: Two Transformers Can Make One Strong GAN [YouTube Video] Paper Authors: Yifan Jiang, Shiyu Chang, Zhangyang Wang CVPR 2021 This is re-implem

Ahmet Sarigun 79 Jan 05, 2023
A full-fledged version of Pix2Seq

Stable-Pix2Seq A full-fledged version of Pix2Seq What it is. This is a full-fledged version of Pix2Seq. Compared with unofficial-pix2seq, stable-pix2s

peng gao 205 Dec 27, 2022