Rank 1st in the public leaderboard of ScanRefer (2021-03-18)

Last update: Dec 07, 2022

Related tags

Overview

InstanceRefer

InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

This repository is for the 1st method on ScanRefer benchmark [arxiv paper].

Zhihao Yuan, Xu Yan, Yinghong Liao, Ruimao Zhang, Zhen Li*, Shuguang Cui

If you find our work useful in your research, please consider citing:

@InProceedings{yuan2021instancerefer,
  title={InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring},
  author={Zhihao Yuan, Xu Yan, Yinghong Liao, Ruimao Zhang, Zhen Li, Shuguang Cui},
  journal={arXiv preprint},
  year={2021}
}

News

2021-03-31 We release InstanceRefer v1 🚀 !
2021-03-18 We achieve 1st place in ScanRefer leaderboard 🔥 .

Getting Started

Setup

The code is tested on Ubuntu 16.04 LTS & 18.04 LTS with PyTorch 1.3.0 CUDA 10.1 installed.

conda install pytorch==1.3.0 cudatoolkit=10.1 -c pytorch

Install the necessary packages listed out in requirements.txt:

pip install -r requirements.txt

After all packages are properly installed, please run the following commands to compile the torchsaprse:

cd lib/torchsparse/
python setup.py install

Before moving on to the next step, please don't forget to set the project root path to the CONF.PATH.BASE in lib/config.py.

Data preparation

Download the ScanRefer dataset and unzip it under data/.
Downloadand the preprocessed GLoVE embeddings (~990MB) and put them under data/.
Download the ScanNetV2 dataset and put (or link) scans/ under (or to) data/scannet/scans/ (Please follow the ScanNet Instructions for downloading the ScanNet dataset). After this step, there should be folders containing the ScanNet scene data under the data/scannet/scans/ with names like scene0000_00
Used official and pre-trained PointGroup generate panoptic segmentation in PointGroupInst/. We provide pre-processed data in Baidu Netdisk [password: 0nxc].
Pre-processed instance labels, and new data should be generated in data/scannet/pointgroup_data/

cd data/scannet/
python prepare_data.py --split train --pointgroupinst_path [YOUR_PATH]
python prepare_data.py --split val   --pointgroupinst_path [YOUR_PATH]
python prepare_data.py --split test  --pointgroupinst_path [YOUR_PATH]

Finally, the dataset folder should be organized as follows.

InstanceRefer
├── data
│   ├── scannet
│   │  ├── meta_data
│   │  ├── pointgroup_data
│   │  │  ├── scene0000_00_aligned_bbox.npy
│   │  │  ├── scene0000_00_aligned_vert.npy
│   │  ├──├──  ... ...

Training

Train the InstanceRefer model. You can change hyper-parameters in config/InstanceRefer.yaml:

python scripts/train.py --log_dir instancerefer

TODO

Updating to the best version.
Release codes for prediction on benchmark.
Release pre-trained model.
Merge PointGroup in an end-to-end manner.

Acknowledgement

This project is not possible without multiple great opensourced codebases.

License

This repository is released under MIT License (see LICENSE file for details).

Rank 1st in the public leaderboard of ScanRefer (2021-03-18)

Related tags

Overview

InstanceRefer

InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

News

Getting Started

Setup

Data preparation

Training

TODO

Acknowledgement

License

Owner

VQMIVC - Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

Template repository for managing machine learning research projects built with PyTorch-Lightning

BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

Official PyTorch implementation of "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation" (ICCV 21).

Bagua is a flexible and performant distributed training algorithm development framework.

Code for Domain Adaptive Video Segmentation via Temporal Consistency Regularization in ICCV 2021

When in Doubt: Improving Classification Performance with Alternating Normalization

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Power Core Simulator!

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

This is a project based on ConvNets used to identify whether a road is clean or dirty. We have used MobileNet as our base architecture and the weights are based on imagenet.

The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

The versatile ocean simulator, in pure Python, powered by JAX.

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

Pytorch Implementation of "Desigining Network Design Spaces", Radosavovic et al. CVPR 2020.

Semantic Segmentation of images using PixelLib with help of Pascalvoc dataset trained with Deeplabv3+ framework.

DL course co-developed by YSDA, HSE and Skoltech

This is a Deep Leaning API for classifying emotions from human face and human audios.