On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

Last update: Dec 22, 2022

Overview

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

This repository contains the evaluation code and alternative pseudo ground truth poses as used in our ICCV 2021 paper.

Pseudo Ground Truth for 7Scenes and 12Scenes
Evaluation Code
Citation

Pseudo Ground Truth for 7Scenes and 12Scenes

We generated alternative SfM-based pseudo ground truth (pGT) using Colmap to supplement the original D-SLAM-based pseudo ground truth of 7Scenes and 12Scenes.

Pose Files

Please find our SfM pose files in the folder pgt. We separated pGT files wrt datasets, individual scenes and the test/training split. Each file contains one line per image that follows the format:

rgb_file qw qx qy qz tx ty tz f

Entries q and t represent the pose as quaternion and translation vector. The pose maps world coordinates to camera coordinates, i.e. p_cam = R(q) p_world + t. This is the same convention used by Colmap. Entry f represents the focal length of the RGB sensor. f was re-estimated by COLMAP and can differ slightly per scene.

We also provide the original D-SLAM pseudo ground truth in this format to be used with our evaluation code below.

Full Reconstructions

The Colmap 3D models are available here:

Note that the Google Drive folder that currently hosts the reconstructions has a daily download limit. We are currently looking into alternative hosting options.

License Information

Since the 3D models and pose files are derived from the original datasets, they are released under the same licences as the 7Scenes and 12Scenes datasets. Before using the datasets, please check the licenses (see the websites of the datasets or the README.md files that come with the 3D models).

Evaluation Code

The main results of our paper can be reproduced using evaluate_estimates.py. The script calculates either the pose error (max of rotation and translation error) or the DCRE error (dense reprojection error). The script prints the recall at a custom threshold to the console, and produces a cumulative error plot as a PDF file.

As input, the script expects a configuration file that points to estimated poses of potentially multiple algorithms and to the pseudo ground truth that these estimates should be compared to. We provide estimated poses of all methods shown in our paper (ActiveSearch, HLoc, R2D2 and DSAC*) in the folder estimates.
These pose files follow the same format as our pGT files described previously, but omit the final f entry.

Furthermore, we provide example config files corresponding to the main experiments in our paper.

Call python evaluate_estimates.py --help for all available options.

For evaluation on 7Scenes, using our SfM pGT, call:

python evaluate_estimates.py config_7scenes_sfm_pgt.json

This produces a new file config_7scenes_sfm_pgt_pose_err.pdf:

For the corresponding plot using the original D-SLAM pGT, call:

python evaluate_estimates.py config_7scenes_dslam_pgt.json

Interpreting the Results

The plots above show very different rankings across methods. Yet, as we discuss in our paper, both plots are valid since no version of the pGT is clearly superior to the other. Furthermore, it appears plausible that any version of pGT is only trustworthy up to a certain accuracy threshold. However, it is non-obvious and currently unknown, how to determine such a trust threshold. We thus strongly discourage to draw any conclusions (beyond that a method might be overfitting to the imperfections of the pseudo ground truth) from the smaller thresholds alone.

We advise to always evaluate methods under both versions of the pGT, and to show both evaluation results in juxtaposition unless specific reasons are given why one version of the pGT is preferred.

DCRE Computation

DCRE computation is triggered with the option --error_type dcre_max or --error_type dcre_mean (see our paper for details). DCRE needs access to the original 7Scenes or 12Scenes data as it requires depth maps. We provide two utility scripts, setup_7scenes.py and setup_12scenes.py, that will download and unpack the associated datasets. Make sure to check each datasets license, via the links above, before downloading and using them.

Note I: The original depth files of 7Scenes are not calibrated, but the DCRE requires calibrated files. The setup script will apply the Kinect calibration parameters found here to register depth to RGB. This essentially involves re-rendering the depth maps which is implemented in native Python and takes a long time due to the large frame count in 7Scenes (several hours). However, this step has to be done only once.

Note II: The DCRE computation by evaluate_estimates.py is implemented on the GPU and reasonably fast. However, due to the large frame count in 7Scenes it can still take considerable time. The parameter --error_max_images limits the max. number of frames used to calculate recall and cumulative errors. The default value of 1000 provides a good tradeoff between accuracy and speed. Use --error_max_images -1 to use all images which is most accurate but slow for 7Scenes.

Uploading Your Method's Estimates

We are happy to include updated evaluation results or evaluation results of new methods in this repository. This would enable easy comparisons across methods with unified evaluation code, as we progress in the field.

If you want your results included, please provide estimates of your method under both pGT versions via a pull request. Please add your estimation files to a custom sub-folder under èstimates_external, following our pose file convention described above. We would also ask that you provide a text file that links your results to a publication or tech report, or contains a description of how you obtained these results.

estimates_external
├── someone_elses_method
└── your_method
    ├── info_your_method.txt
    ├── dslam
    │   ├── 7scenes
    │   │   ├── chess_your_method.txt
    │   │   ├── fire_your_method.txt
    │   │   ├── ...
    │   └── 12scenes
    │       ├── ...
    └── sfm
        ├── ...

Dependencies

This code requires the following python packages, and we tested it with the package versions in brackets

pytorch (1.6.0)
opencv (3.4.2)
scikit-image (0.16.2)

The repository contains an environment.yml for the use with Conda:

conda env create -f environment.yml
conda activate pgt

License Information

Our evaluation code and data utility scripts are based on parts of DSAC*, and we provide our code under the same BSD-3 license.

Citation

If you are using either the evaluation code or the Structure-from-Motion pseudo GT for the 7Scenes or 12Scenes datasets, please cite the following work:

@InProceedings{Brachmann2021ICCV,
    author = {Brachmann, Eric and Humenberger, Martin and Rother, Carsten and Sattler, Torsten},
    title = {{On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization}},
    booktitle = {International Conference on Computer Vision (ICCV)},
    year = {2021},
}

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

Related tags

Overview

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

Pseudo Ground Truth for 7Scenes and 12Scenes

Pose Files

Full Reconstructions

License Information

Evaluation Code

Interpreting the Results

DCRE Computation

Uploading Your Method's Estimates

Dependencies

License Information

Citation

Owner

Torsten Sattler

Code release for General Greedy De-bias Learning

The official PyTorch implementation of Curriculum by Smoothing (NeurIPS 2020, Spotlight).

CLIPImageClassifier wraps clip image model from transformers

OneFlow is a performance-centered and open-source deep learning framework.

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Code for the paper 'A High Performance CRF Model for Clothes Parsing'.

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Meta Language-Specific Layers in Multilingual Language Models

The final project of "Applying AI to 3D Medical Imaging Data" from "AI for Healthcare" nanodegree - Udacity.

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Establishing Strong Baselines for TripClick Health Retrieval; ECIR 2022

Intrinsic Image Harmonization

Optical Character Recognition + Instance Segmentation for russian and english languages

Repository for self-supervised landmark discovery

Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge

MaskTrackRCNN for video instance segmentation based on mmdetection

BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.