Towards Part-Based Understanding of RGB-D Scans

Last update: Nov 23, 2022

Overview

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021)

We propose the task of part-based scene understanding of real-world 3D environments: from an RGB-D scan of a scene, we detect objects, and for each object predict its decomposition into geometric part masks, which composed together form the complete geometry of the observed object.

Download Paper (.pdf)

Demo samples

Get started

The core of this repository is a network, which takes as input preprocessed scan voxel crops and produces voxelized part trees. However, data preparation is very massive step before launching actual training and inference. That's why we release already prepared data for training and checkpoint to perform inference. If you want to launch training with our data, please follow the steps below:

Clone repo: git clone https://github.com/alexeybokhovkin/part-based-scan-understanding.git
Download data and/or checkpoint:
ScanNet MLCVNet crops (finetune) [894M]
ScanNet clean crops (pretraining) [995M]
PartNet GT trees [103M]
Parts priors [169M]
Checkpoint [19M]
For training, prepare augmented version of ScanNet crops with script dataproc/prepare_rot_aug_data.py. After this, create a folder with all necessary dataset metadata using script dataproc/gather_all_shapes.py
Create config file similar to configs/config_gnn_scannet_allshapes.yaml (you need to provide paths to some directories and files)
Launch training with train_gnn_scannet.py

Citation

If you use this framework please cite:

@article{Bokhovkin2020TowardsPU,
  title={Towards Part-Based Understanding of RGB-D Scans},
  author={Alexey Bokhovkin and V. Ishimtsev and Emil Bogomolov and D. Zorin and A. Artemov and Evgeny Burnaev and Angela Dai},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.02094}
}

You might also like...

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PN-Net We present a neural field-based framework for depth estimation from single-view RGB images. Rather than representing a 2D depth map as a single

1 Oct 2, 2021

PoseCamera is python based SDK for human pose estimation through RGB webcam.

PoseCamera PoseCamera is python based SDK for human pose estimation through RGB webcam. Install install posecamera package through pip pip install pos

7 Jul 20, 2021

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

188 Dec 27, 2022

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

OcclusionFusion (CVPR'2022) Project Page | Paper | Video Overview This repository contains the code for the CVPR 2022 paper OcclusionFusion, where we

193 Dec 15, 2022

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow

18 Oct 6, 2022

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

Comments

scannet_shape_ids files and part segmentation
First of all, thanks for the great work! I have two questions about this repo and your paper:

It seems that txt files for scannet_shape_ids are required for prepare_rot_aug_data.py. But I cannot find them in the provided dataset files.

Could you explain more details about part segmentation on 3D scans? I'm confused if the part segmentation labels for 3d scans are generated by 1) aligning PartNet data, 2) assigning part labels to overlapped regions. Do you provide point-wise (or voxel-wise) part segmentation annotation?
opened by jeonghyunkeem 0

Towards Part-Based Understanding of RGB-D Scans

Related tags

Overview

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021)

Demo samples

Get started

Citation

You might also like...

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PoseCamera is python based SDK for human pose estimation through RGB webcam.

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos.

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

Comments

scannet_shape_ids files and part segmentation

Releases(v0.1)

v0.1(Jun 18, 2021)

Owner

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021

Cockpit is a visual and statistical debugger specifically designed for deep learning.

Lightweight tool to perform MITM attack on local network

A python library to build Model Trees with Linear Models at the leaves.

Real-time analysis of intracranial neurophysiology recordings.

Towards Part-Based Understanding of RGB-D Scans

Wav2Vec for speech recognition, classification, and audio classification

Code for Deep Single-image Portrait Image Relighting

NeuralCompression is a Python repository dedicated to research of neural networks that compress data

PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection.

COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models

RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting

Neural-fractal - Create Fractals Using Complex-Valued Neural Networks!

Paaster is a secure by default end-to-end encrypted pastebin built with the objective of simplicity.

Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

potpourri3d - An invigorating blend of 3D geometry tools in Python.

Code accompanying the paper "Knowledge Base Completion Meets Transfer Learning"

This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)

Code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms.