Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

Last update: Dec 21, 2022

Related tags

Overview

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

This repo is the official implementation of "DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion"

by Peng Sun, Wenhu Zhang, Huanyu Wang, Songyuan Li, and Xi Li.

Prerequisites

Ubuntu 18
PyTorch 1.7.0
CUDA 10.1
Cudnn 7.5.1
Python 3.7
Numpy 1.17.3

Training

Please see launch_train.sh and launch_pretrain.sh for imagenet pretraining and sod training, respectively.

Testing

Please see launch_test.sh for testing on the sod benchmarks.

Main Results

Dataset	E_r	S_λ^mean	F_β^mean	M
DUT-RGBD	0.950	0.921	0.926	0.030
NJUD	0.923	0.903	0.901	0.039
NLPR	0.950	0.918	0.897	0.024
SSD	0.904	0.876	0.852	0.045
STEREO	0.933	0.904	0.898	0.036
LFSD	0.923	0.882	0.882	0.054
RGBD135	0.962	0.920	0.896	0.021

Saliency maps and Evaluation

All of the saliency maps mentioned in the paper are available on GoogleDrive or BaiduYun(code:juc2).

You can use the toolbox provided by jiwei0921 for evaluation.

Additionally, we also provide the saliency maps of the STERE-1000 and SIP dataset on BaiduYun(code:qxfw) for easy comparison.

Dataset	E_r	S_λ^mean	F_β^mean	M
STERE-1000	0.928	0.897	0.895	0.038
SIP	0.908	0.861	0.868	0.057

Citation

@inproceedings{Sun2021DeepRS,
  title={Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion},
  author={P. Sun and Wenhu Zhang and Huanyu Wang and Songyuan Li and Xi Li},
  journal={IEEE Conf. Comput. Vis. Pattern Recog.},
  year={2021}
}

License

The code is released under MIT License (see LICENSE file for details).

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

Related tags

Overview

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

Prerequisites

Training

Testing

Main Results

Saliency maps and Evaluation

Citation

License

Owner

如今我已剑指天涯

Fastshap: A fast, approximate shap kernel

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis

mlpack: a scalable C++ machine learning library --

Generating Anime Images by Implementing Deep Convolutional Generative Adversarial Networks paper

Assginment for UofT CSC420: Intro to Image Understanding

Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

Algorithm to texture 3D reconstructions from multi-view stereo images

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

A free, multiplatform SDK for real-time facial motion capture using blendshapes, and rigid head pose in 3D space from any RGB camera, photo, or video.

Luminaire is a python package that provides ML driven solutions for monitoring time series data.

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

A Pytorch implementation of "LegoNet: Efficient Convolutional Neural Networks with Lego Filters" (ICML 2019).

TinyML Cookbook, published by Packt

An adaptive hierarchical energy management strategy for hybrid electric vehicles

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

Barbershop: GAN-based Image Compositing using Segmentation Masks (SIGGRAPH Asia 2021)

SAAVN - Sound Adversarial Audio-Visual Navigation,ICLR2022 (In PyTorch)