Dense Unsupervised Learning for Video Segmentation (NeurIPS*2021)

Overview

Dense Unsupervised Learning for Video Segmentation

License Framework

This repository contains the official implementation of our paper:

Dense Unsupervised Learning for Video Segmentation
Nikita Araslanov, Simone Schaub-Mayer and Stefan Roth
To appear at NeurIPS*2021. [paper] [supp] [talk] [example results] [arXiv]

drawing

We efficiently learn spatio-temporal correspondences
without any supervision, and achieve state-of-the-art
accuracy of video object segmentation.

Contact: Nikita Araslanov fname.lname (at) visinf.tu-darmstadt.de


Installation

Requirements. To reproduce our results, we recommend Python >=3.6, PyTorch >=1.4, CUDA >=10.0. At least one Titan X GPUs (12GB) or equivalent is required. The code was primarily developed under PyTorch 1.8 on a single A100 GPU.

The following steps will set up a local copy of the repository.

  1. Create conda environment:
conda create --name dense-ulearn-vos
source activate dense-ulearn-vos
  1. Install PyTorch >=1.4 (see PyTorch instructions). For example on Linux, run:
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
  1. Install the dependencies:
pip install -r requirements.txt
  1. Download the data:
Dataset Website Target directory with video sequences
YouTube-VOS Link data/ytvos/train/JPEGImages/
OxUvA Link data/OxUvA/images/dev/
TrackingNet Link data/tracking/train/jpegs/
Kinetics-400 Link data/kinetics400/video_jpeg/train/

The last column in this table specifies a path to subdirectories (relative to the project root) containing images of video frames. You can obviously use a different path structure. In this case, you will need to adjust the paths in data/filelists/ for every dataset accordingly.

  1. Download filelists:
cd data/filelists
bash download.sh

This will download lists of training and validation paths for all datasets.

Training

We following bash script will train a ResNet-18 model from scratch on one of the four supported datasets (see above):

bash ./launch/train.sh [ytvos|oxuva|track|kinetics]

We also provide our final models for download.

Dataset Mean J&F (DAVIS-2017) Link MD5
OxUvA 65.3 oxuva_e430_res4.pth (132M) af541[...]d09b3
YouTube-VOS 69.3 ytvos_e060_res4.pth (132M) c3ae3[...]55faf
TrackingNet 69.4 trackingnet_e088_res4.pth (88M) 3e7e9[...]95fa9
Kinetics-400 68.7 kinetics_e026_res4.pth (88M) 086db[...]a7d98

Inference and evaluation

Inference

To run the inference use launch/infer_vos.sh:

bash ./launch/infer_vos.sh [davis|ytvos]

The first argument selects the validation dataset to use (davis for DAVIS-2017; ytvos for YouTube-VOS). The bash variables declared in the script further help to set up the paths for reading the data and the pre-trained models as well as the output directory:

  • EXP, RUN_ID and SNAPSHOT determine the pre-trained model to load.
  • VER specifies a suffix for the output directory (in case you would like to experiment with different configurations for label propagation). Please, refer to launch/infer_vos.sh for their usage.

The inference script will create two directories with the result: [res3|res4|key]_vos and [res3|res4|key]_vis, where the prefix corresponds to the codename of the output CNN layer used in the evaluation (selected in infer_vos.sh using KEY variable). The vos-directory contains the segmentation result ready for evaluation; the vis-directory produces the results for visualisation purposes. You can optionally disable generating the visualisation by setting VERBOSE=False in infer_vos.py.

Evaluation: DAVIS-2017

Please use the official evaluation package. Install the repository, then simply run:

python evaluation_method.py --task semi-supervised --davis_path data/davis2017 --results_path <path-to-vos-directory>

Evaluation: YouTube-VOS 2018

Please use the official CodaLab evaluation server. To create the submission, rename the vos-directory to Annotations and compress it to Annotations.zip for uploading.

Acknowledgements

We thank PyTorch contributors and Allan Jabri for releasing their implementation of the label propagation.

Citation

We hope you find our work useful. If you would like to acknowledge it in your project, please use the following citation:

@inproceedings{Araslanov:2021:DUL,
  author    = {Araslanov, Nikita and Simone Schaub-Mayer and Roth, Stefan},
  title     = {Dense Unsupervised Learning for Video Segmentation},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  volume    = {34},
  year = {2021}
}
Owner
Visual Inference Lab @TU Darmstadt
Visual Inference Lab @TU Darmstadt
Deploy optimized transformer based models on Nvidia Triton server

Deploy optimized transformer based models on Nvidia Triton server

Lefebvre Sarrut Services 1.2k Jan 05, 2023
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

torch-imle Concise and self-contained PyTorch library implementing the I-MLE gradient estimator proposed in our NeurIPS 2021 paper Implicit MLE: Backp

UCL Natural Language Processing 249 Jan 03, 2023
Pytorch-3dunet - 3D U-Net model for volumetric semantic segmentation written in pytorch

pytorch-3dunet PyTorch implementation 3D U-Net and its variants: Standard 3D U-Net based on 3D U-Net: Learning Dense Volumetric Segmentation from Spar

Adrian Wolny 1.3k Dec 28, 2022
Woosung Choi 63 Nov 14, 2022
ANEA: Distant Supervision for Low-Resource Named Entity Recognition

ANEA: Distant Supervision for Low-Resource Named Entity Recognition ANEA is a tool to automatically annotate named entities in unlabeled text based on

Saarland University Spoken Language Systems Group 15 Mar 30, 2022
Do Neural Networks for Segmentation Understand Insideness?

This is part of the code to reproduce the results of the paper Do Neural Networks for Segmentation Understand Insideness? [pdf] by K. Villalobos (*),

biolins 0 Mar 20, 2021
Official implementation of the NeurIPS 2021 paper Online Learning Of Neural Computations From Sparse Temporal Feedback

Online Learning Of Neural Computations From Sparse Temporal Feedback This repository is the official implementation of the NeurIPS 2021 paper Online L

Lukas Braun 3 Dec 15, 2021
A tool for calculating distortion parameters in coordination complexes.

OctaDist Octahedral distortion calculator: A tool for calculating distortion parameters in coordination complexes. https://octadist.github.io/ Registe

OctaDist 12 Oct 04, 2022
[ICCV 2021] Deep Hough Voting for Robust Global Registration

Deep Hough Voting for Robust Global Registration, ICCV, 2021 Project Page | Paper | Video Deep Hough Voting for Robust Global Registration Junha Lee1,

57 Nov 28, 2022
Two-stage CenterNet

Probabilistic two-stage detection Two-stage object detectors that use class-agnostic one-stage detectors as the proposal network. Probabilistic two-st

Xingyi Zhou 1.1k Jan 03, 2023
Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

Data Efficient Stagewise Knowledge Distillation Table of Contents Data Efficient Stagewise Knowledge Distillation Table of Contents Requirements Image

IvLabs 112 Dec 02, 2022
SegNet-like Autoencoders in TensorFlow

SegNet SegNet is a TensorFlow implementation of the segmentation network proposed by Kendall et al., with cool features like strided deconvolution, a

Andrea Azzini 66 Nov 05, 2021
A tiny, pedagogical neural network library with a pytorch-like API.

candl A tiny, pedagogical implementation of a neural network library with a pytorch-like API. The primary use of this library is for education. Use th

Sri Pranav 3 May 23, 2022
Ascend your Jupyter Notebook usage

Jupyter Ascending Sync Jupyter Notebooks from any editor About Jupyter Ascending lets you edit Jupyter notebooks from your favorite editor, then insta

Untitled AI 254 Jan 08, 2023
Boostcamp AI Tech 3rd / Basic Paper reading w.r.t Embedding

Boostcamp AI Tech 3rd : Basic Paper Reading w.r.t Embedding TL;DR 1992년부터 2018년도까지 이루어진 word/sentence embedding의 중요한 줄기를 이루는 기초 논문 스터디를 진행하고자 합니다. 논

Soyeon Kim 14 Nov 14, 2022
TLXZoo - Pre-trained models based on TensorLayerX

Pre-trained models based on TensorLayerX. TensorLayerX is a multi-backend AI fra

TensorLayer Community 13 Dec 07, 2022
Painting app using Python machine learning and vision technology.

AI Painting App We are making an app that will track our hand and helps us to draw from that. We will be using the advance knowledge of Machine Learni

Badsha Laskar 3 Oct 03, 2022
Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

Training Script for Reuse-VOS This code implementation of CVPR 2021 paper : Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Vi

HYOJINPARK 22 Jan 01, 2023
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
Self-Supervised Learning for Domain Adaptation on Point-Clouds

Self-Supervised Learning for Domain Adaptation on Point-Clouds Introduction Self-supervised learning (SSL) allows to learn useful representations from

Idan Achituve 66 Dec 20, 2022