GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

Last update: Jan 07, 2023

Overview

GDR-Net

This repo provides the PyTorch implementation of the work:

Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. In CVPR 2021. [Paper][ArXiv][Video][bibtex]

Overview

Requirements

Ubuntu 16.04/18.04, CUDA 10.1/10.2, python >= 3.6, PyTorch >= 1.6, torchvision
Install detectron2 from source
sh scripts/install_deps.sh
Compile the cpp extension for farthest points sampling (fps):
```
sh core/csrc/compile.sh
```

Datasets

Download the 6D pose datasets (LM, LM-O, YCB-V) from the BOP website and VOC 2012 for background images. Please also download the image_sets and test_bboxes from here (BaiduNetDisk, OneDrive, password: qjfk).

The structure of datasets folder should look like below:

# recommend using soft links (ln -sf)
datasets/
├── BOP_DATASETS
    ├──lm
    ├──lmo
    ├──ycbv
├── lm_imgn  # the OpenGL rendered images for LM, 1k/obj
├── lm_renders_blender  # the Blender rendered images for LM, 10k/obj (pvnet-rendering)
├── VOCdevkit

lm_imgn comes from DeepIM, which can be downloaded here (BaiduNetDisk, OneDrive, password: vr0i).
lm_renders_blender comes from pvnet-rendering, note that we do not need the fused data.

Training GDR-Net

./core/gdrn_modeling/train_gdrn.sh <config_path> <gpu_ids> (other args)

Example:

./core/gdrn_modeling/train_gdrn.sh configs/gdrn/lm/a6_cPnP_lm13.py 0  # multiple gpus: 0,1,2,3
# add --resume if you want to resume from an interrupted experiment.

Our trained GDR-Net models can be found here (BaiduNetDisk, OneDrive, password: kedv).
_{^{(Note that the models for BOP setup in the supplement were trained using a refactored version of this repo (not compatible), they are slightly better than the models provided here.)}}

Evaluation

./core/gdrn_modeling/test_gdrn.sh <config_path> <gpu_ids> <ckpt_path> (other args)

Example:

./core/gdrn_modeling/test_gdrn.sh configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py 0 output/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e/gdrn_lmo_real_pbr.pth

Citation

If you find this useful in your research, please consider citing:

@InProceedings{Wang_2021_GDRN,
    title     = {{GDR-Net}: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation},
    author    = {Wang, Gu and Manhardt, Fabian and Tombari, Federico and Ji, Xiangyang},
    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {16611-16621}
}

Comments

关于cuda版本的问题

您好，请问我使用cuda11以上的版本可以训练吗，因为我只有A6000和A100的显卡，它们不兼容cuda11以下的版本。我用cuda11.1和torch1.8或者1.9训练时，都会报double free or corruption (!prev)、RuntimeError: DataLoader worker (pid(s) xxxxx) exited unexpectedly。
help wanted

opened by fn6767 9
Pipeline to print inferred 3D bounding boxes on images

Hello! I find this work really interesting. After successfully testing inference (LMO and YCB) I was just interested in plotting the inference results as 3D bounding boxes on RGB images and by inspecting the code I bumped into:

https://github.com/THU-DA-6D-Pose-Group/GDR-Net/blob/5fb30c3dc53f46bac24a8a83a373eac7a8038556/core/gdrn_modeling/gdrn_evaluator.py#L516

it is the function used for inference, which seems to show the results in terms of the different metrics, but not showing graphical results as I am looking for

In the same file I noticed the function: https://github.com/THU-DA-6D-Pose-Group/GDR-Net/blob/5fb30c3dc53f46bac24a8a83a373eac7a8038556/core/gdrn_modeling/gdrn_evaluator.py#L634

which seems structurally similar but with some input differences, in particular I would like to ask if the input dataloader can be be computed for gdrn_inference_on_dataset as for save_result_of_dataset as in

https://github.com/THU-DA-6D-Pose-Group/GDR-Net/blob/5fb30c3dc53f46bac24a8a83a373eac7a8038556/core/gdrn_modeling/engine.py#L135-L137

Since from preliminar debugging it seems it is not possible to access to the "image" field of the input sample in

https://github.com/THU-DA-6D-Pose-Group/GDR-Net/blob/5fb30c3dc53f46bac24a8a83a373eac7a8038556/core/gdrn_modeling/gdrn_evaluator.py#L678

Possibly related issue: https://github.com/THU-DA-6D-Pose-Group/GDR-Net/issues/56

opened by AlbertoRemus 8
Some question of the paper

你好，论文里关于MXYZ到M2D-3D的转化是这样说的。"$M_{2D-3D}$ can then be derived by stacking $M_{XYZ}$onto the corresponding 2D pixel coordinates". 但是我还是不太清楚为什么从$3\times64\times64$维度的$M_{XYZ}$转变成了$2\times64\times64$维度的$M_{2D-3D}$。以及为什么要做这样一个转化呢，直接将预测的XYZ归一化之后和MSRA Concatenation不行吗？

opened by Mr2er0 8
关于更换数据的问题

王博，您好！您的工作对我的帮助很大，非常感谢您提供的开源项目。现在我想使用自己的数据在您的模型上训练，之前的一些issue里您只提到了应该如何处理和组织自己的数据，但并没有提及如果要使用自己的数据，应该修改哪些部分的代码。因为之前在lm数据集上训练时需要先生成一些文件，所以我猜测如果要将模型应用在自己的数据上，可能需要修改的地方有很多，可以请您具体讲讲吗？期待您的回复，再次致谢！

opened by micki-37 7
Questions about LM-O evaluation results

Hi! thanks for your great work. I execute the following command to get the evaluation results of LM-O as follows, ‘GDR-Net-DATA‘ is the folder where I put the trained models. ./core/gdrn_modeling/test_gdrn.sh configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py 1 GDR-Net-DATA/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e/gdrn_lmo_real_pbr.pth Is ‘ad_10’ the ‘Average Recall (%) of ADD(-S)’ mentioned in Table 2 in the paper?

opened by Liuchongpei 7
Zero recall value while evaluating on LMO dataset
Hello @wangg12

I tried to evaluate the GDR-Net model on LMO dataset using the pretrained models you shared on OneDrive. I used following command to run the valuation:

python core/gdrn_modeling/main_gdrn.py --config-file configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py \ --num-gpus 1 \ --eval-only \ --opts MODEL.WEIGHTS=output/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e/gdrn_lmo_real_pbr.pth

However, it is showing zero recall values. Please see the screenshot below. Could you please help?

Thank you, Supriya
opened by supriya-gdptl 6

evaluation failed for lmoSO

Hi,

When I train GDR-Net on ape of LMO dataset by

./core/gdrn_modeling/train_gdrn.sh configs/gdrn/lmoSO/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_80e_SO/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_80e_ape.py 1

I get the unexpected output at the end of log.txt:

core.gdrn_modeling.test_utils [email protected]: evaluation failed.
core.gdrn_modeling.test_utils [email protected]: =====================================================================
core.gdrn_modeling.test_utils [email protected]: output/gdrn/lmoSO/a6_cPnP_AugAAETrunc_lmo_real_pbr0.1_80e_SO/ape/inference_model_final/lmo_test/a6-cPnP-AugAAETrunc-BG0.5-lmo-real-pbr0.1-80e-ape-test-iter0_lmo-test-bb8/error:ad_ntop:1 does not exist.

Could you suggest how to fix it? Thanks!

opened by RuyiLian 6

One drive link seems not working

Hi, unfortunately the One-Drive link of pretrained model seems to provide the following error on different browsers, do you have any insight about this?

Thanks in advance,

Alberto

opened by AlbertoRemus 5
关于xyz_crop生成问题

王博你好，我在使用tools/lm/lm_pbr_1_gen_xyz_crop.py生成xyz_crop文件的过程中遇到了这个问题。

Traceback (most recent call last): File "tools/lm/lmo_pbr_1_gen_xyz_crop.py", line 228, in xyz_gen.main() File "tools/lm/lmo_pbr_1_gen_xyz_crop.py", line 137, in main bgr_gl, depth_gl = self.get_renderer().render(render_obj_id, IM_W, IM_H, K, R, t, near, far) File "tools/lm/lmo_pbr_1_gen_xyz_crop.py", line 98, in get_renderer self.renderer = Renderer( File "/data/hsm/gdr/tools/lm/../../lib/meshrenderer/meshrenderer_phong.py", line 26, in init self._fbo = gu.Framebuffer( File "/data/hsm/gdr/tools/lm/../../lib/meshrenderer/gl_utils/fbo.py", line 22, in init glNamedFramebufferTexture(self.__id, k, attachement.id, 0) File "/data/hsm/env/gdrn2/lib/python3.8/site-packages/OpenGL/platform/baseplatform.py", line 415, in call return self( *args, **named ) ctypes.ArgumentError: argument 1: <class 'TypeError'>: wrong type

我认为可能是在 https://github.com/THU-DA-6D-Pose-Group/GDR-Net/blob/main/lib/meshrenderer/gl_utils/fbo.py#L19 中传入的k类型不匹配所以出错。图片为debug中显示的glNamedFramebufferTexture函数要求传入的数据类型。在issue中没有找到与我类似的问题，请问有人有任何解决这个问题的相关建议吗？
need-more-info

opened by hellohaley 5
CUDA out of memory

We implement the training process with pbr rendered data on eight GPU parallel computing (NIVDIA 2080 Ti with graphic memory of 12 G) , it barely starts training in batchsize 8 (original is 24). But when we resume the training process, CUDA will be out of memory.

We'd like to know the author's training configuration...

opened by GabrielleTse 5
Loss_region unable to converge

Other Loss has significant decline， but Loss_region‘s drop is very weak. My training use config : configs/gdrn/lm/a6_cPnP_lm13.py Region area choose 4, 16, 64 can not make any improve.

opened by lu-ming-lei 5
Generating test_bboxes/faster_R50_FPN_AugCosyAAE_HalfAnchor_lmo_pbr_lmo_fuse_real_all_8e_test_480x640.json file

Hello @wangg12,

Sorry to bother you again.

Could you please tell me how to generate faster_R50_FPN_AugCosyAAE_HalfAnchor_lmo_pbr_lmo_fuse_real_all_8e_test_480x640.json in lmo/test/test_bboxes folder?

Which code did you run to obtain this file?

Thank you, Supriya

opened by supriya-gdptl 1

Releases(v1.1)

v1.1(Nov 25, 2021)

better ddp support
Source code(tar.gz)
Source code(zip)
v1.0.1(Nov 25, 2021)

tag some updates from the first release
Source code(tar.gz)
Source code(zip)
v1.0(May 8, 2021)

Initial release
Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository https://git.io/GDR-Net

Controlling the MicriSpotAI robot from scratch

Abstract: The SpotMicroAI project is designed to be a low cost, easily built quadruped robot. The design is roughly based off of Boston Dynamics quadr

405 Jan 05, 2023

[MedIA2021]MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning

MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning [MedIA or Arxiv] and [Demo] This repository pr

92 Dec 08, 2022

JupyterNotebook - C/C++, Javascript, HTML, LaTex, Shell scripts in Jupyter Notebook Also run them on remote computer

JupyterNotebook Read, write and execute C, C++, Javascript, Shell scripts, HTML, LaTex in jupyter notebook, And also execute them on remote computer R

1 Jan 09, 2022

A python library for self-supervised learning on images.

Lightly is a computer vision framework for self-supervised learning. We, at Lightly, are passionate engineers who want to make deep learning more effi

2k Jan 08, 2023

Retina blood vessel segmentation with a convolutional neural network

Retina blood vessel segmentation with a convolution neural network (U-net) This repository contains the implementation of a convolutional neural netwo

1.2k Jan 06, 2023

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

112 Nov 07, 2022

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

70 Nov 04, 2022

Complex Answer Generation For Conversational Search Systems.

Complex Answer Generation For Conversational Search Systems. Code for Does Structure Matter? Leveraging Data-to-Text Generation for Answering Complex

0 Dec 06, 2021

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Real-time RGBD-based Extended Body Pose Estimation This repository is a real-time demo for our paper that was published at WACV 2021 conference The ou

118 Dec 26, 2022

Python suite to construct benchmark machine learning datasets from the MIMIC-III clinical database.

MIMIC-III Benchmarks Python suite to construct benchmark machine learning datasets from the MIMIC-III clinical database. Currently, the benchmark data

6 Jan 02, 2023

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

71 Dec 08, 2022

Multi-Person Extreme Motion Prediction

Multi-Person Extreme Motion Prediction Implementation for paper Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer, Multi-Person Extre

38 Nov 15, 2022

Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

64 Dec 22, 2022

Contrastive Learning for Compact Single Image Dehazing, CVPR2021

AECR-Net Contrastive Learning for Compact Single Image Dehazing, CVPR2021. Official Pytorch based implementation. Paper arxiv Pytorch Version TODO: mo

253 Jan 01, 2023

Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature fo

50 Dec 21, 2022

Semi-supervised Transfer Learning for Image Rain Removal. In CVPR 2019.

Semi-supervised Transfer Learning for Image Rain Removal This package contains the Python implementation of "Semi-supervised Transfer Learning for Ima

59 Dec 26, 2022

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

SPLASH: Semantic Parsing with Language Assistance from Humans SPLASH is dataset for the task of semantic parse correction with natural language feedba

35 Oct 31, 2022

Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?

Adversrial Machine Learning Benchmarks This code belongs to the papers: Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness? Det

9 Nov 27, 2022

Search and filter videos based on objects that appear in them using convolutional neural networks

Thingscoop: Utility for searching and filtering videos based on their content Description Thingscoop is a command-line utility for analyzing videos se

354 Dec 04, 2022

Java and SHACL code commented in the paper "Towards compliance checking in reified I/O logic via SHACL" submitted to ICAIL 2021

shRIOL The subfolder shRIOL contains Java files to execute the SHACL files on the OWL ontology. To compile the Java files: "javac -cp ./src/;./lib/* -

1 Dec 06, 2022

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

Related tags

Overview

GDR-Net

Overview

Requirements

Datasets

Training GDR-Net

Evaluation

Citation

Comments

Releases(v1.1)

v1.1(Nov 25, 2021)

v1.0.1(Nov 25, 2021)

v1.0(May 8, 2021)

Owner

Controlling the MicriSpotAI robot from scratch

[MedIA2021]MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning

JupyterNotebook - C/C++, Javascript, HTML, LaTex, Shell scripts in Jupyter Notebook Also run them on remote computer

A python library for self-supervised learning on images.

Retina blood vessel segmentation with a convolutional neural network

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Complex Answer Generation For Conversational Search Systems.

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Python suite to construct benchmark machine learning datasets from the MIMIC-III clinical database.

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

Multi-Person Extreme Motion Prediction

Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Contrastive Learning for Compact Single Image Dehazing, CVPR2021

Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

Semi-supervised Transfer Learning for Image Rain Removal. In CVPR 2019.

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?

Search and filter videos based on objects that appear in them using convolutional neural networks

Java and SHACL code commented in the paper "Towards compliance checking in reified I/O logic via SHACL" submitted to ICAIL 2021