An implementation for the ICCV 2021 paper Deep Permutation Equivariant Structure from Motion.

Last update: Dec 27, 2022

Related tags

Overview

Deep Permutation Equivariant Structure from Motion

Paper | Poster

This repository contains an implementation for the ICCV 2021 paper Deep Permutation Equivariant Structure from Motion.

The paper proposes a neural network architecture that, given a set of point tracks in multiple images of a static scene, recovers both the camera parameters and a (sparse) scene structure by minimizing an unsupervised reprojection loss. The method does not require initialization of camera parameters or 3D point locations and is implemented for two setups: (1) single scene reconstruction and (2) learning from multiple scenes.

Setup
How to use
Citation

Setup

This repository is implemented with python 3.8, and in order to run bundle adjustment requires linux.

Folders

The repository should contain the following folders:

Equivariant-SFM
├── bundle_adjustment
├── code
├── datasets
│   ├── Euclidean
│   └── Projective
├── environment.yml
├── results

Conda envorinment

Create the environment using one of the following commands:

conda create -n ESFM -c pytorch -c conda-forge -c comet_ml -c plotly  -c fvcore -c iopath -c bottler -c anaconda -c pytorch3d python=3.8 pytorch cudatoolkit=10.2 torchvision pyhocon comet_ml plotly pandas opencv openpyxl xlrd cvxpy fvcore iopath nvidiacub pytorch3d eigen cmake glog gflags suitesparse gxx_linux-64 gcc_linux-64 dask matplotlib
conda activate ESFM

Or:

conda env create -f environment.yml
conda activate ESFM

And follow the bundle adjustment instructions.

Data

Download the data from this link.

The model can work on both calibrated camera setting (euclidean reconstruction) and on uncalibrated cameras (projective reconstruction).

The input for the model is an observed points matrix of size [m,n,2] where the entry [i,j] is a 2D image point that corresponds to camera (image) number i and 3D point (point track) number j.

In practice we use a correspondence matrix representation of size [2*m,n], where the entries [2*i,j] and [2*i+1,j] form the [i,j] image point.

For the calibrated setting, the input must include m calibration matrices of size [3,3].

How to use

Optimization

For a calibrated scene optimization run:

python single_scene_optimization.py --conf Optimization_Euc.conf

For an uncalibrated scene optimization run:

python single_scene_optimization.py --conf Optimization_Proj.conf

The following examples are for the calibrated settings but are clearly the same for the uncalibrated setting.

You can choose which scene to optimize either by changing the config file in the field 'dataset.scan' or from the command line:

python single_scene_optimization.py --conf Optimization_Euc.conf --scan [scan_name]

Similarly, you can override any value of the config file from the command line. For example, to change the number of training epochs and the evaluation frequency use:

python single_scene_optimization.py --conf Optimization_Euc.conf --external_params "train:num_of_epochs:1e+5,train:eval_intervals:100"

Learning

To run the learning setup run:

python multiple_scenes_learning.py --conf Learning_Euc.conf

Or for the uncalibrated setting:

python multiple_scenes_learning.py --conf Learning_Proj.conf

To override some parameters from the config file, you can either change the file itself or use the same command as in the optimization setting:

python multiple_scenes_learning.py --conf Learning_Euc.conf --external_params "train:num_of_epochs:1e+5,train:eval_intervals:100"

Citation

If you find this work useful please cite:

@InProceedings{Moran_2021_ICCV,
    author    = {Moran, Dror and Koslowsky, Hodaya and Kasten, Yoni and Maron, Haggai and Galun, Meirav and Basri, Ronen},
    title     = {Deep Permutation Equivariant Structure From Motion},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {5976-5986}
}

An implementation for the ICCV 2021 paper Deep Permutation Equivariant Structure from Motion.

Related tags

Overview

Deep Permutation Equivariant Structure from Motion

Paper | Poster

Table of Contents

Setup

Folders

Conda envorinment

Data

How to use

Optimization

Learning

Citation

Owner

Deep Learning for 3D Point Clouds: A Survey (IEEE TPAMI, 2020)

Manage the availability of workspaces within Frappe/ ERPNext (sidebar) based on user-roles

Torch-mutable-modules - Use in-place and assignment operations on PyTorch module parameters with support for autograd

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

Class-Attentive Diffusion Network for Semi-Supervised Classification [AAAI'21] (official implementation)

Mail classification with tensorflow and MS Exchange Server (ham or spam).

Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising

Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"

Make your own game in a font!

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Advancing mathematics by guiding human intuition with AI

make ASCII Art by Deep Learning

Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.

PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper.

PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Churn prediction

A torch implementation of "Pixel-Level Domain Transfer"