PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

Overview

Unsupervised Depth Completion with Calibrated Backprojection Layers

PyTorch implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers

Published in ICCV 2021 (ORAL)

[publication] [arxiv] [poster] [talk]

Model have been tested on Ubuntu 16.04, 20.04 using Python 3.5, 3.6, 3.7 PyTorch 1.2, 1.3

Authors: Alex Wong

If this work is useful to you, please cite our paper:

@inproceedings{wong2021unsupervised,
  title={Unsupervised Depth Completion with Calibrated Backprojection Layers},
  author={Wong, Alex and Soatto, Stefano},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={12747--12756},
  year={2021}
}

Table of Contents

  1. About sparse to dense depth completion
  2. About Calibrated Backprojection Network
  3. Setting up
  4. Downloading pretrained models
  5. Running KBNet
  6. Training KBNet
  7. Related projects
  8. License and disclaimer

About sparse-to-dense depth completion

Given sparse point cloud and image, the goal is to infer the dense point cloud. The sparse point cloud can obtained either from computational methods such as SfM (Strcuture-from-Motion) or active sensors such as lidar or structured light sensors. Commonly, it is projected onto the image plane as a sparse depth map or 2.5D representation, in which case, methods in this domain predicts a dense depth map. Here are some examples of dense point clouds outputted by our method:

Image Sparse Point Cloud Output Point Cloud

To follow the literature and benchmarks for this task, you may visit: Awesome State of Depth Completion

About Calibrated Backprojection Network

The motivation:

(1) In the scene above of the copyroom and outdoor bench, the point cloud produced by XIVO is on the order of hundreds of points. When projected onto the image plane as a 2.5D range map, the sparse points cover only 0.05% of the image space -- where typically only a single measurement will be present within a local neighborhood and in most cases, none. This not only hinders learning by rendering conventional convolutions ineffective, which will produce mostly zero activations, but also increases the sensitivity of the model to the variations in the range sensor and feature detector used to produce the point cloud.

(2) Typically the same sensor platform is used to collect the training set, so the model tends to overfit to the sensor setup. This is exacerbated in the unsupervised learning paradigm which leverages a photometric reconstruction loss as a supervisory signal. Because image reconstruction requires reprojection from one frame to another, this implicitly bakes in the intrinsic camera calibration parameters and limits generalization.

Our solution:

(1) To address the sparsity problem, we propose to project the point cloud onto the image plane as a sparse range map and learn a dense or quasi dense representation via a sparse to dense pooling (S2D) module. S2D performs min and max pooling with various kernel sizes to densify and capture the scene structure on multiple scales as in the figure below.

There exists trade-offs between detail and density (more dense, less detail) and between preservation of near and far structures (min pool biases structures close to the camera, max pool biases structures far from the camera). These trade-offs are learned by three 1 by 1 convolutional layers and the resulting multi-scale depth features are fused back into the original sparse depth map to yield a dense or quasi-dense representation.

(2) To address the generalization problem, we propose to take an image, the projected sparse point cloud, and the calibration matrix as input. We introduce a calibrated backprojection layer or a KB layer that maps camera intrinsics, input image, and the imputed depth onto the 3D scene in a canonical frame of reference. This can be thought of as a form of spatial Euclidean positional encoding of the image.

Calibration, therefore, can be changed depending on the camera used, allowing us to use different calibrations in training and test time, which significantly improves generalization.

Our network, Calibrated Backprojection Network (KBNet), goes counter to the current trend of learning everything with generic architectures like Transformers, including what we already know about basic Euclidean geometry. Our model has strong inductive bias in our KB layer, which incorporates the calibration matrix directly into the architecture to yield an RGB representation lifted into scene topology via 3D positional encoding.

Not only do the design choices improve generalization across sensor platforms, by incorporating a basic geometric image formation model based on Euclidean transformations in 3D and central perspective projection onto 2D, we can reduce the model size while still achieving the state of the art.

To demonstrate the effectiveness of our method, we trained a model on the VOID dataset, which is captured by an Intel RealSense, and tested it on NYU v2, which is collected with a Microsoft Kinect.

Setting up your virtual environment

We will create a virtual environment with the necessary dependencies

virtualenv -p /usr/bin/python3.7 kbnet-py37env
source kbnet-py37env/bin/activate
pip install opencv-python scipy scikit-learn scikit-image matplotlib gdown numpy gast Pillow pyyaml
pip install torch==1.3.0 torchvision==0.4.1 tensorboard==2.3.0

Setting up your datasets

For datasets, we will use KITTI for outdoors and VOID for indoors. We will also use NYUv2 to demonstrate our generalization capabilities.

mkdir data
ln -s /path/to/kitti_raw_data data/
ln -s /path/to/kitti_depth_completion data/
ln -s /path/to/void_release data/
ln -s /path/to/nyu_v2 data/

In case you do not already have KITTI and VOID datasets downloaded, we provide download scripts for them:

bash bash/setup_dataset_kitti.sh
bash bash/setup_dataset_void.sh

The bash/setup_dataset_void.sh script downloads the VOID dataset using gdown. However, gdown intermittently fails. As a workaround, you may download them via:

https://drive.google.com/open?id=1GGov8MaBKCEcJEXxY8qrh8Ldt2mErtWs
https://drive.google.com/open?id=1c3PxnOE0N8tgkvTgPbnUZXS6ekv7pd80
https://drive.google.com/open?id=14PdJggr2PVJ6uArm9IWlhSHO2y3Q658v

which will give you three files void_150.zip, void_500.zip, void_1500.zip.

Assuming you are in the root of the repository, to construct the same dataset structure as the setup script above:

mkdir void_release
unzip -o void_150.zip -d void_release/
unzip -o void_500.zip -d void_release/
unzip -o void_1500.zip -d void_release/
bash bash/setup_dataset_void.sh unpack-only

For more detailed instructions on downloading and using VOID and obtaining the raw rosbags, you may visit the VOID dataset webpage.

Downloading our pretrained models

To use our pretrained models trained on KITTI and VOID models, you can download them from Google Drive

gdown https://drive.google.com/uc?id=1C2RHo6E_Q8TzXN_h-GjrojJk4FYzQfRT
unzip pretrained_models.zip

Note: gdown fails intermittently and complains about permission. If that happens, you may also download the models via:

https://drive.google.com/file/d/1C2RHo6E_Q8TzXN_h-GjrojJk4FYzQfRT/view?usp=sharing

Once you unzip the file, you will find a directory called pretrained_models containing the following file structure:

pretrained_models
|---- kitti
      |---- kbnet-kitti.pth
      |---- posenet-kitti.pth
|---- void
      |---- kbnet-void1500.pth
      |---- posenet-void1500.pth

We also provide our PoseNet model that was trained jointly with our Calibrated Backproject Network (KBNet) so that you may finetune on them without having to relearn pose from scratch.

The pretrained weights should reproduce the numbers we reported in our paper. The table below are the comprehensive numbers:

For KITTI:

Evaluation set MAE RMSE iMAE iRMSE
Validation 260.44 1126.85 1.03 3.20
Testing (online) 256.76 1069.47 1.02 2.95

For VOID:

Evaluation set MAE RMSE iMAE iRMSE
VOID 1500 (0.5% density) 39.80 95.86 21.16 49.72
VOID 500 (0.15% density) 77.70 172.49 38.87 85.59
VOID 150 (0.05% density) 131.54 263.54 66.84 128.29
NYU v2 (generalization) 117.18 218.67 23.01 47.96

Running KBNet

To run our pretrained model on the KITTI validation set, you may use

bash bash/kitti/run_kbnet_kitti_validation.sh

Our run scripts will log all of the hyper-parameters used as well as the evaluation scores based on the output_path argument. The expected output should be:

Evaluation results:
     MAE      RMSE      iMAE     iRMSE
 260.447  1126.855     1.035     3.203
     +/-       +/-       +/-       +/-
  92.735   398.888     0.285     1.915
Total time: 13187.93 ms  Average time per sample: 15.19 ms

Our model runs fairly fast, the reported number in the paper is 16ms for KITTI images on an Nvidia 1080Ti GPU. The above is just slightly faster than the reported number.

To run our pretrained model on the KITTI test set, you may use

bash bash/kitti/run_kbnet_kitti_testing.sh

To get our numbers, you will need to submit the outputs to the KITTI online benchmark.

To run our pretrained model on the VOID 1500 test set of 0.5% density, you may use

bash bash/void/run_kbnet_void1500.sh

You should expect the output:

Evaluation results:
     MAE      RMSE      iMAE     iRMSE
  39.803    95.864    21.161    49.723
     +/-       +/-       +/-       +/-
  27.521    67.776    24.340    62.204
Total time: 10399.33 ms  Average time per sample: 13.00 ms

We note that for all of the following experiments, we will use our model trained on denser (VOID 1500) data and test them on various density levels.

Similar to the above, for the VOID 500 (0.15%) test set, you can run:

bash bash/void/run_kbnet_void500.sh

and the VOID 150 (0.05%) test set:

bash bash/void/run_kbnet_void150.sh

To use our model trained on VOID and test it on NYU v2:

bash bash/void/run_kbnet_nyu_v2.sh

Training KBNet

To train KBNet on the KITTI dataset, you may run

bash bash/kitti/train_kbnet_vkitti.sh

To train KBNet on the VOID dataset, you may run

bash bash/void/train_kbnet_void1500.sh

Note that while we do not train on VOID 500 or 150 (hence no hyper-parameters are provided), if interested you may modify the training paths to train on VOID 500:

--train_image_path training/void/void_train_image_500.txt \
--train_sparse_depth_path training/voidvoid_train_sparse_depth_500.txt \
--train_intrinsics_path training/void/void_train_intrinsics_500.txt \

and on VOID 150:

--train_image_path training/void/void_train_image_150.txt \
--train_sparse_depth_path training/voidvoid_train_sparse_depth_150.txt \
--train_intrinsics_path training/void/void_train_intrinsics_150.txt \

To monitor your training progress, you may use Tensorboard

tensorboard --logdir trained_kbnet/kitti/kbnet_model
tensorboard --logdir trained_kbnet/void1500/kbnet_model

Related projects

You may also find the following projects useful:

  • ScaffNet: Learning Topology from Synthetic Data for Unsupervised Depth Completion. An unsupervised sparse-to-dense depth completion method that first learns a map from sparse geometry to an initial dense topology from synthetic data (where ground truth comes for free) and amends the initial estimation by validating against the image. This work is published in the Robotics and Automation Letters (RA-L) 2021 and the International Conference on Robotics and Automation (ICRA) 2021.
  • AdaFrame: Learning Topology from Synthetic Data for Unsupervised Depth Completion. An adaptive framework for learning unsupervised sparse-to-dense depth completion that balances data fidelity and regularization objectives based on model performance on the data. This work is published in the Robotics and Automation Letters (RA-L) 2021 and the International Conference on Robotics and Automation (ICRA) 2021.
  • VOICED: Unsupervised Depth Completion from Visual Inertial Odometry. An unsupervised sparse-to-dense depth completion method, developed by the authors. The paper introduces Scaffolding for depth completion and a light-weight network to refine it. This work is published in the Robotics and Automation Letters (RA-L) 2020 and the International Conference on Robotics and Automation (ICRA) 2020.
  • VOID: from Unsupervised Depth Completion from Visual Inertial Odometry. A dataset, developed by the authors, containing indoor and outdoor scenes with non-trivial 6 degrees of freedom. The dataset is published along with this work in the Robotics and Automation Letters (RA-L) 2020 and the International Conference on Robotics and Automation (ICRA) 2020.
  • XIVO: The Visual-Inertial Odometry system developed at UCLA Vision Lab. This work is built on top of XIVO. The VOID dataset used by this work also leverages XIVO to obtain sparse points and camera poses.
  • GeoSup: Geo-Supervised Visual Depth Prediction. A single image depth prediction method developed by the authors, published in the Robotics and Automation Letters (RA-L) 2019 and the International Conference on Robotics and Automation (ICRA) 2019. This work was awarded Best Paper in Robot Vision at ICRA 2019.
  • AdaReg: Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction. A single image depth prediction method that introduces adaptive regularization. This work was published in the proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) 2019.

We also have works in adversarial attacks on depth estimation methods and medical image segmentation:

  • Stereopagnosia: Stereopagnosia: Fooling Stereo Networks with Adversarial Perturbations. Adversarial perturbations for stereo depth estimation, published in the Proceedings of AAAI Conference on Artificial Intelligence (AAAI) 2021.
  • Targeted Attacks for Monodepth: Targeted Adversarial Perturbations for Monocular Depth Prediction. Targeted adversarial perturbations attacks for monocular depth estimation, published in the proceedings of Neural Information Processing Systems (NeurIPS) 2020.
  • SPiN : Small Lesion Segmentation in Brain MRIs with Subpixel Embedding. Subpixel architecture for segmenting ischemic stroke brain lesions in MRI images, published in the Proceedings of Medical Image Computing and Computer Assisted Intervention (MICCAI) Brain Lesion Workshop 2021 as an oral paper.

License and disclaimer

This software is property of the UC Regents, and is provided free of charge for research purposes only. It comes with no warranties, expressed or implied, according to these terms and conditions. For commercial use, please contact UCLA TDG.

Owner
I am a post-doctoral researcher at the UCLA Vision Lab under the supervision of Professor Stefano Soatto.
SAAVN - Sound Adversarial Audio-Visual Navigation,ICLR2022 (In PyTorch)

SAAVN SAAVN Code release for paper "Sound Adversarial Audio-Visual Navigation,IC

YinfengYu 10 Aug 30, 2022
Make differentially private training of transformers easy for everyone

private-transformers This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers. What is this? Why

Xuechen Li 73 Dec 28, 2022
Official implementation of the NRNS paper: No RL, No Simulation: Learning to Navigate without Navigating

No RL No Simulation (NRNS) Official implementation of the NRNS paper: No RL, No Simulation: Learning to Navigate without Navigating NRNS is a heriarch

Meera Hahn 20 Nov 29, 2022
The implementation of ICASSP 2020 paper "Pixel-level self-paced learning for super-resolution"

Pixel-level Self-Paced Learning for Super-Resolution This is an official implementaion of the paper Pixel-level Self-Paced Learning for Super-Resoluti

Elon Lin 41 Dec 15, 2022
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Code for "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval" (ACL 2021, Long) This is the repository for baseline m

Akari Asai 25 Oct 30, 2022
基于PaddleOCR搭建的OCR server... 离线部署用

开头说明 DangoOCR 是基于大家的 CPU处理器 来运行的,CPU处理器 的好坏会直接影响其速度, 但不会影响识别的精度 ,目前此版本识别速度可能在 0.5-3秒之间,具体取决于大家机器的配置,可以的话尽量不要在运行时开其他太多东西。需要配合团子翻译器 Ver3.6 及其以上的版本才可以使用!

胖次团子 131 Dec 25, 2022
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 170 Jan 04, 2023
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning

isvd Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning If you find this code useful, you may cite us as: @inprocee

Sami Abu-El-Haija 16 Jan 08, 2023
Exploring the link between uncertainty estimates obtained via "exact" Bayesian inference and out-of-distribution (OOD) detection.

Uncertainty-based OOD detection Exploring the link between uncertainty estimates obtained by "exact" Bayesian inference and out-of-distribution (OOD)

Christian Henning 1 Nov 05, 2022
Piotr - IoT firmware emulation instrumentation for training and research

Piotr: Pythonic IoT exploitation and Research Introduction to Piotr Piotr is an emulation helper for Qemu that provides a convenient way to create, sh

Damien Cauquil 51 Nov 09, 2022
A new framework, collaborative cascade prediction based on graph neural networks (CCasGNN) to jointly utilize the structural characteristics, sequence features, and user profiles.

CCasGNN A new framework, collaborative cascade prediction based on graph neural networks (CCasGNN) to jointly utilize the structural characteristics,

5 Apr 29, 2022
Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis

HAABSAStar Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis". This project builds on the code from https://gith

1 Sep 14, 2020
GDSC-ML Team Interview Task

GDSC-ML-Team---Interview-Task Task 1 : Clean or Messy room In this task we have to classify the given test images as clean or messy. - Link for datase

Aayush. 1 Jan 19, 2022
Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Clairvoyance: A Pipeline Toolkit for Medical Time Series Authors: van der Schaar Lab This repository contains implementations of Clairvoyance: A Pipel

van_der_Schaar \LAB 89 Dec 07, 2022
Diverse graph algorithms implemented using JGraphT library.

# 1. Installing Maven & Pandas First, please install Java (JDK11) and Python 3 if they are not already. Next, make sure that Maven (for importing J

See Woo Lee 3 Dec 17, 2022
Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Git repositoty of the manuscript entitled Statistical quantification of confounding bias in predictive modelling by Tamas Spisak The manuscript descri

PNI - Predictive Neuroimaging Lab, University Hospital Essen, Germany 0 Nov 22, 2021
Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at [email protected]

TableParser Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at DS3 Lab 11 Dec 13, 2022

Using LSTM write Tang poetry

本教程将通过一个示例对LSTM进行介绍。通过搭建训练LSTM网络,我们将训练一个模型来生成唐诗。本文将对该实现进行详尽的解释,并阐明此模型的工作方式和原因。并不需要过多专业知识,但是可能需要新手花一些时间来理解的模型训练的实际情况。为了节省时间,请尽量选择GPU进行训练。

56 Dec 15, 2022
Designing a Practical Degradation Model for Deep Blind Image Super-Resolution (ICCV, 2021) (PyTorch) - We released the training code!

Designing a Practical Degradation Model for Deep Blind Image Super-Resolution Kai Zhang, Jingyun Liang, Luc Van Gool, Radu Timofte Computer Vision Lab

Kai Zhang 804 Jan 08, 2023