ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Last update: Dec 19, 2022

Related tags

Overview

This is the project page for the paper:

ISTR: End-to-End Instance Segmentation via Transformers,
Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wang, Ke Li, Feiyue Huang, Ling Shao, Rongrong Ji,
arXiv 2105.00637

⭐ Highlights:

GPU Friendly: Four 1080Ti/2080Ti GPUs can handle the training for R50, R101 backbones with ISTR.
High Performance: On COCO test-dev, ISTR-R50-3x gets 46.8/38.6 box/mask AP, and ISTR-R101-3x gets 48.1/39.9 box/mask AP.

Updates

(2021.05.03) The project page for ISTR is avaliable.

Models

Method	inf. time	box AP	mask AP	download
ISTR-R50-3x	17.8 FPS	46.8	38.6	model \| log
ISTR-R101-3x	13.9 FPS	48.1	39.9	model \| log

The inference time is evaluated with a single 2080Ti GPU.
We use the models pre-trained on ImageNet using torchvision. The ImageNet pre-trained ResNet-101 backbone is obtained from SparseR-CNN.

Installation

The codes are built on top of Detectron2, SparseR-CNN, and AdelaiDet.

Requirements

Python=3.8
PyTorch=1.6.0, torchvision=0.7.0, cudatoolkit=10.1
OpenCV for visualization

Steps

Install the repository (we recommend to use Anaconda for installation.)

conda create -n ISTR python=3.8 -y
conda activate ISTR
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
pip install opencv-python
pip install scipy
pip install shapely
git clone https://github.com/hujiecpp/ISTR.git
cd ISTR
python setup.py build develop

Link coco dataset path

ln -s /coco_dataset_path/coco ./datasets

Train ISTR (e.g., with ResNet50 backbone)

python projects/ISTR/train_net.py --num-gpus 4 --config-file projects/ISTR/configs/ISTR-R50-3x.yaml

Evaluate ISTR (e.g., with ResNet50 backbone)

python projects/ISTR/train_net.py --num-gpus 4 --config-file projects/ISTR/configs/ISTR-R50-3x.yaml --eval-only MODEL.WEIGHTS ./output/model_final.pth

Visualize the detection and segmentation results (e.g., with ResNet50 backbone)

python demo/demo.py --config-file projects/ISTR/configs/ISTR-R50-3x.yaml --input input1.jpg --output ./output --confidence-threshold 0.4 --opts MODEL.WEIGHTS ./output/model_final.pth

Citation

If our paper helps your research, please cite it in your publications:

@article{hu2021ISTR,
  title={ISTR: End-to-End Instance Segmentation via Transformers},
  author={Hu, Jie and Cao, Liujuan and Lu, Yao and Zhang, ShengChuan and Li, Ke and Huang, Feiyue and Shao, Ling and Ji, Rongrong},
  journal={arXiv preprint arXiv:2105.00637},
  year={2021}
}

ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Related tags

Overview

Updates

Models

Installation

Requirements

Steps

Citation

Owner

Jie Hu

Repo for the Video Person Clustering dataset, and code for the associated paper

Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Time Dependent DFT in Tamm-Dancoff Approximation

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

Fake News Detection Using Machine Learning Methods

An implementation of the BADGE batch active learning algorithm.

Unsupervised captioning - Code for Unsupervised Image Captioning

A self-supervised 3D representation learning framework named viewpoint bottleneck.

Python library for science observations from the James Webb Space Telescope

Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at [email protected]

CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum

This repository is an unoffical PyTorch implementation of Medical segmentation in 3D and 2D.

Code for NeurIPS 2021 paper "Curriculum Offline Imitation Learning"

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Train emoji embeddings based on emoji descriptions.

SVG Icon processing tool for C++

Dynamical movement primitives (DMPs), probabilistic movement primitives (ProMPs), spatially coupled bimanual DMPs.

Prompt Tuning with Rules

Implementation of CVPR'2022:Surface Reconstruction from Point Clouds by Learning Predictive Context Priors