DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene.

Last update: Dec 31, 2022

Related tags

Overview

DirectVoxGO

DirectVoxGO (Direct Voxel Grid Optimization, see our paper) reconstructs a scene representation from a set of calibrated images capturing the scene.

NeRF-comparable quality for synthesizing novel views from our scene representation.
Super-fast convergence: Our 15 mins/scene vs. NeRF's 10~20+ hrs/scene.
No cross-scene pre-training required: We optimize each scene from scratch.
Better rendering speed: Our <1 secs vs. NeRF's 29 secs to synthesize a 800x800 images.

Below run-times (mm:ss) of our optimization progress are measured on a machine with a single RTX 2080 Ti GPU.

github_teaser.mp4

Update

2021.11.23: Support CO3D dataset.
2021.11.23: Initial release. Issue page is disabled for now. Feel free to contact [email protected] if you have any questions.

Installation

git clone [email protected]:sunset1995/DirectVoxGO.git
cd DirectVoxGO
pip install -r requirements.txt

Pytorch installation is machine dependent, please install the correct version for your machine. The tested version is pytorch 1.8.1 with python 3.7.4.

Dependencies (click to expand)

PyTorch, numpy: main computation.
scipy, lpips: SSIM and LPIPS evaluation.
tqdm: progress bar.
mmcv: config system.
opencv-python: image processing.
imageio, imageio-ffmpeg: images and videos I/O.

Download: datasets, trained models, and rendered test views

Directory structure for the datasets (click to expand; only list used files)

data
├── nerf_synthetic     # Link: https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1
│   └── [chair|drums|ficus|hotdog|lego|materials|mic|ship]
│       ├── [train|val|test]
│       │   └── r_*.png
│       └── transforms_[train|val|test].json
│
├── Synthetic_NSVF     # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/Synthetic_NSVF.zip
│   └── [Bike|Lifestyle|Palace|Robot|Spaceship|Steamtrain|Toad|Wineholder]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0_train|1_val|2_test]_*.png
│       └── pose
│           └── [0_train|1_val|2_test]_*.txt
│
├── BlendedMVS         # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/BlendedMVS.zip
│   └── [Character|Fountain|Jade|Statues]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0|1|2]_*.png
│       └── pose
│           └── [0|1|2]_*.txt
│
├── TanksAndTemple     # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/TanksAndTemple.zip
│   └── [Barn|Caterpillar|Family|Ignatius|Truck]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0|1|2]_*.png
│       └── pose
│           └── [0|1|2]_*.txt
│
├── deepvoxels     # Link: https://drive.google.com/drive/folders/1ScsRlnzy9Bd_n-xw83SP-0t548v63mPH
│   └── [train|validation|test]
│       └── [armchair|cube|greek|vase]
│           ├── intrinsics.txt
│           ├── rgb/*.png
│           └── pose/*.txt
│
└── co3d               # Link: https://github.com/facebookresearch/co3d
    └── [donut|teddybear|umbrella|...]
        ├── frame_annotations.jgz
        ├── set_lists.json
        └── [129_14950_29917|189_20376_35616|...]
            ├── images
            │   └── frame*.jpg
            └── masks
                └── frame*.png

Synthetic-NeRF, Synthetic-NSVF, BlendedMVS, Tanks&Temples, DeepVoxels datasets

We use the datasets organized by NeRF, NSVF, and DeepVoxels. Download links:

Synthetic-NeRF dataset (manually extract the nerf_synthetic.zip to data/)
Synthetic-NSVF dataset (manually extract the Synthetic_NSVF.zip to data/)
BlendedMVS dataset (manually extract the BlendedMVS.zip to data/)
Tanks&Temples dataset (manually extract the TanksAndTemple.zip to data/)
DeepVoxels dataset (manually extract the synthetic_scenes.zip to data/deepvoxels/)

Download all our trained models and rendered test views at this link to our logs.

CO3D dataset

We also support the recent Common Objects In 3D dataset. Our method only performs per-scene reconstruction and no cross-scene generalization.

GO

Train

To train lego scene and evaluate testset PSNR at the end of training, run:

$ python run.py --config configs/nerf/lego.py --render_test

Use --i_print and --i_weights to change the log interval.

Evaluation

To only evaluate the testset PSNR, SSIM, and LPIPS of the trained lego without re-training, run:

$ python run.py --config configs/nerf/lego.py --render_only --render_test \
                                              --eval_ssim --eval_lpips_vgg

Use --eval_lpips_alex to evaluate LPIPS with pre-trained Alex net instead of VGG net.

Reproduction

All config files to reproduce our results:

$ ls configs/*
configs/blendedmvs:
Character.py  Fountain.py  Jade.py  Statues.py

configs/nerf:
chair.py  drums.py  ficus.py  hotdog.py  lego.py  materials.py  mic.py  ship.py

configs/nsvf:
Bike.py  Lifestyle.py  Palace.py  Robot.py  Spaceship.py  Steamtrain.py  Toad.py  Wineholder.py

configs/tankstemple:
Barn.py  Caterpillar.py  Family.py  Ignatius.py  Truck.py

configs/deepvoxels:
armchair.py  cube.py  greek.py  vase.py

Your own config files

Check the comments in configs/default.py for the configuable settings. The default values reproduce our main setup reported in our paper. We use mmcv's config system. To create a new config, please inherit configs/default.py first and then update the fields you want. Below is an example from configs/blendedmvs/Character.py:

_base_ = '../default.py'

expname = 'dvgo_Character'
basedir = './logs/blended_mvs'

data = dict(
    datadir='./data/BlendedMVS/Character/',
    dataset_type='blendedmvs',
    inverse_y=True,
    white_bkgd=True,
)

Development and tuning guide

Extention to new dataset

Adjusting the data related config fields to fit your camera coordinate system is recommend before implementing a new one. We provide two visualization tools for debugging.

Inspect the camera and the allocated BBox.

Export via --export_bbox_and_cams_only {filename}.npz:

python run.py --config configs/nerf/mic.py --export_bbox_and_cams_only cam_mic.npz

Visualize the result:
```
python tools/vis_train.py cam_mic.npz
```

Inspect the learned geometry after coarse optimization.
- Export via --export_coarse_only {filename}.npz (assumed coarse_last.tar available in the train log):
```
python run.py --config configs/nerf/mic.py --export_coarse_only coarse_mic.npz
```
- Visualize the result:
```
python tools/vis_volume.py coarse_mic.npz 0.001 --cam cam_mic.npz
```

Inspecting the cameras & BBox	Inspecting the learned coarse volume

Speed and quality tradeoff

We have reported some ablation experiments in our paper supplementary material. Setting N_iters, N_rand, num_voxels, rgbnet_depth, rgbnet_width to larger values or setting stepsize to smaller values typically leads to better quality but need more computation. Only stepsize is tunable in testing phase, while all the other fields should remain the same as training.

Acknowledgement

The code base is origined from an awesome nerf-pytorch implementation, but it becomes very different from the code base now.

DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene.

Related tags

Overview

DirectVoxGO

Update

Installation

Download: datasets, trained models, and rendered test views

Synthetic-NeRF, Synthetic-NSVF, BlendedMVS, Tanks&Temples, DeepVoxels datasets

CO3D dataset

GO

Train

Evaluation

Reproduction

Your own config files

Development and tuning guide

Extention to new dataset

Speed and quality tradeoff

Acknowledgement

Owner

sunset

Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice,

A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models

DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction

Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Benchmark for Answering Existential First Order Queries with Single Free Variable

Unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing, Pattern Recognition

Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19 (Oral).

This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications

PyTorch implementation of Memory-based semantic segmentation for off-road unstructured natural environments.

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

I explore rock vs. mine prediction using a SONAR dataset

All course materials for the Zero to Mastery Machine Learning and Data Science course.

Detecting Blurred Ground-based Sky/Cloud Images

RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds