Dynamic View Synthesis from Dynamic Monocular Video

Last update: Dec 28, 2022

Related tags

Overview

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video
Chen Gao, Ayush Saraf, Johannes Kopf, Jia-Bin Huang
in ICCV 2021

Setup

The code is test with

Linux (tested on CentOS Linux release 7.4.1708)
Anaconda 3
Python 3.7.11
CUDA 10.1
1 V100 GPU

To get started, please create the conda environment dnerf by running

conda create --name dnerf
conda activate dnerf
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy opencv -c pytorch
pip install imageio configargparse timm lpips

and install COLMAP manually. Then download MiDaS and RAFT weights

ROOT_PATH=/path/to/the/DynamicNeRF/folder
cd $ROOT_PATH
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/weights.zip
unzip weights.zip
rm weights.zip

Dynamic Scene Dataset

The Dynamic Scene Dataset is used to quantitatively evaluate our method. Please download the pre-processed data by running:

cd $ROOT_PATH
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/data.zip
unzip data.zip
rm data.zip

Training

You can train a model from scratch by running:

cd $ROOT_PATH/
python run_nerf.py --config configs/config_Balloon2.txt

Every 100k iterations, you should get videos like the following examples

The novel view-time synthesis results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/novelviewtime.

The reconstruction results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset.

The fix-view-change-time results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset_view000.

The fix-time-change-view results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset_time000.

Rendering from pre-trained models

We also provide pre-trained models. You can download them by running:

cd $ROOT_PATH/
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/logs.zip
unzip logs.zip
rm logs.zip

Then you can render the results directly by running:

python run_nerf.py --config configs/config_Balloon2.txt --render_only --ft_path $ROOT_PATH/logs/Balloon2_H270_DyNeRF_pretrain/300000.tar

Evaluating our method and others

Our goal is to make the evaluation as simple as possible for you. We have collected the fix-view-change-time results of the following methods:

NeRF
NeRF + t
Yoon et al.
Non-Rigid NeRF
NSFF
DynamicNeRF (ours)

Please download the results by running:

cd $ROOT_PATH/
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/results.zip
unzip results.zip
rm results.zip

Then you can calculate the PSNR/SSIM/LPIPS by running:

cd $ROOT_PATH/utils
python evaluation.py

PSNR / LPIPS	Jumping	Skating	Truck	Umbrella	Balloon1	Balloon2	Playground	Average
NeRF	20.99 / 0.305	23.67 / 0.311	22.73 / 0.229	21.29 / 0.440	19.82 / 0.205	24.37 / 0.098	21.07 / 0.165	21.99 / 0.250
NeRF + t	18.04 / 0.455	20.32 / 0.512	18.33 / 0.382	17.69 / 0.728	18.54 / 0.275	20.69 / 0.216	14.68 / 0.421	18.33 / 0.427
NR NeRF	20.09 / 0.287	23.95 / 0.227	19.33 / 0.446	19.63 / 0.421	17.39 / 0.348	22.41 / 0.213	15.06 / 0.317	19.69 / 0.323
NSFF	24.65 / 0.151	29.29 / 0.129	25.96 / 0.167	22.97 / 0.295	21.96 / 0.215	24.27 / 0.222	21.22 / 0.212	24.33 / 0.199
Ours	24.68 / 0.090	32.66 / 0.035	28.56 / 0.082	23.26 / 0.137	22.36 / 0.104	27.06 / 0.049	24.15 / 0.080	26.10 / 0.082

Please note:

The numbers reported in the paper are calculated using TF code. The numbers here are calculated using this improved Pytorch version.
In Yoon's results, the first frame and the last frame are missing. To compare with Yoon's results, we have to omit the first frame and the last frame. To do so, please uncomment line 72 and comment line 73 in evaluation.py.
We obtain the results of NSFF and NR NeRF using the official implementation with default parameters.

Train a model on your sequence

Set some paths

ROOT_PATH=/path/to/the/DynamicNeRF/folder
DATASET_NAME=name_of_the_video_without_extension
DATASET_PATH=$ROOT_PATH/data/$DATASET_NAME

Prepare training images and background masks from a video.

cd $ROOT_PATH/utils
python generate_data.py --videopath /path/to/the/video

Use COLMAP to obtain camera poses.

colmap feature_extractor \
--database_path $DATASET_PATH/database.db \
--image_path $DATASET_PATH/images_colmap \
--ImageReader.mask_path $DATASET_PATH/background_mask \
--ImageReader.single_camera 1

colmap exhaustive_matcher \
--database_path $DATASET_PATH/database.db

mkdir $DATASET_PATH/sparse
colmap mapper \
    --database_path $DATASET_PATH/database.db \
    --image_path $DATASET_PATH/images_colmap \
    --output_path $DATASET_PATH/sparse \
    --Mapper.num_threads 16 \
    --Mapper.init_min_tri_angle 4 \
    --Mapper.multiple_models 0 \
    --Mapper.extract_colors 0

Save camera poses into the format that NeRF reads.

cd $ROOT_PATH/utils
python generate_pose.py --dataset_path $DATASET_PATH

Estimate monocular depth.

cd $ROOT_PATH/utils
python generate_depth.py --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/midas_v21-f6b98070.pt

Predict optical flows.

cd $ROOT_PATH/utils
python generate_flow.py --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/raft-things.pth

Obtain motion mask (code adapted from NSFF).

cd $ROOT_PATH/utils
python generate_motion_mask.py --dataset_path $DATASET_PATH

Train a model. Please change expname and datadir in configs/config.txt.

cd $ROOT_PATH/
python run_nerf.py --config configs/config.txt

Explanation of each parameter:

expname: experiment name
basedir: where to store ckpts and logs
datadir: input data directory
factor: downsample factor for the input images
N_rand: number of random rays per gradient step
N_samples: number of samples per ray
netwidth: channels per layer
use_viewdirs: whether enable view-dependency for StaticNeRF
use_viewdirsDyn: whether enable view-dependency for DynamicNeRF
raw_noise_std: std dev of noise added to regularize sigma_a output
no_ndc: do not use normalized device coordinates
lindisp: sampling linearly in disparity rather than depth
i_video: frequency of novel view-time synthesis video saving
i_testset: frequency of testset video saving
N_iters: number of training iterations
i_img: frequency of tensorboard image logging
DyNeRF_blending: whether use DynamicNeRF to predict blending weight
pretrain: whether pre-train StaticNeRF

License

This work is licensed under MIT License. See LICENSE for details.

If you find this code useful for your research, please consider citing the following paper:

@inproceedings{Gao-ICCV-DynNeRF,
    author    = {Gao, Chen and Saraf, Ayush and Kopf, Johannes and Huang, Jia-Bin},
    title     = {Dynamic View Synthesis from Dynamic Monocular Video},
    booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
    year      = {2021}
}

Acknowledgments

Our training code is build upon NeRF, NeRF-pytorch, and NSFF. Our flow prediction code is modified from RAFT. Our depth prediction code is modified from MiDaS.

Dynamic View Synthesis from Dynamic Monocular Video

Related tags

Overview

Dynamic View Synthesis from Dynamic Monocular Video

Setup

Dynamic Scene Dataset

Training

Rendering from pre-trained models

Evaluating our method and others

Train a model on your sequence

License

Acknowledgments

Owner

Chen Gao

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

The MLOps platform for innovators 🚀

A PyTorch Toolbox for Face Recognition

EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation

Datasets for new state-of-the-art challenge in disentanglement learning

Adversarial vulnerability of powerful near out-of-distribution detection

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

Object detection evaluation metrics using Python.

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Bottom-up Human Pose Estimation

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".

Pytorch code for "DPFM: Deep Partial Functional Maps" - 3DV 2021 (Oral)

CCAFNet: Crossflow and Cross-scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images

Tandem Mass Spectrum Prediction with Graph Transformers

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

Re-TACRED: Addressing Shortcomings of the TACRED Dataset

[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation