A fast model to compute optical flow between two input images.

Related tags

Deep LearningDCVNet
Overview

DCVNet: Dilated Cost Volumes for Fast Optical Flow

This repository contains our implementation of the paper:

@InProceedings{jiang2021dcvnet,
  title={DCVNet: Dilated Cost Volumes for Fast Optical Flow},
  author={Jiang, Huaizu and Learned-Miller, Erik},
  booktitle={arXiv},
  year={2021}
}

Need a fast optical flow model? Try DCVNet

  • Fast. On a mid-end GTX 1080ti GPU, DCVNet runs in real time at 71 fps (frames-per-second) to process images with sizes of 1024 × 436.
  • Compact and accurate. DCVNet has 4.94M parameters and consumes 1.68GB GPU memory during inference. It achieves comparable accuracy to state-of-the-art approaches on the MPI Sintel benchmark.

In the figure above, for each model, the circle radius indicates the number of parameters (larger radius means more parameters). The center of a circle corresponds to a model’s EPE (end-point-error).

Requirements

This code has been tested with Python 3.7, PyTorch 1.6.0, and CUDA 9.2. We suggest to use a conda environment.

conda create -n dcvnet
conda activate dcvnet
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboardX scipy opencv -c pytorch
pip install yacs

We use an open-source implementation https://github.com/ClementPinard/Pytorch-Correlation-extension to compute dilated cost volumes. Follow the instructions there to install this module.

Demos

Pretrained models can be downloaded by running

./scripts/download_models.sh

or downloaded from Google drive.

You can demo a pre-trained model on a sequence of frames

python demo.py --weights-path pretrained_models/sceneflow_dcvnet.pth --path demo-frames

Required data

The following datasets are required to train and evaluate DCVNet.

We borrow the data loaders used in RAFT. By default, dcvnet/data/raft/datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder

|-- datasets
    |-- Driving
        |-- frames_cleanpass
        |-- optical_flow
    |-- FlyingThings3D_subset
        |-- train
            |-- flow
            |-- image_clean
        |-- val
            |-- flow
            |-- image_clean
    |-- Monkaa
        |-- frames_cleanpass
        |-- optical_flow
    |-- MPI_Sintel
        |-- test
        |-- training
    |-- KITTI2012
        |-- testing
        |-- training
    |-- KITTI2015
        |-- testing
        |-- training
    |-- HD1K
        |-- hd1k_flow_gt
        |-- hd1k_input

Evaluation

You can evaluate a pre-trained model using tools/evaluate_optical_flow.py

python evaluate_optical_flow.py --weights_path models/dcvnet-sceneflow.pth --dataset sintel

You can optionally add the --amp switch to do inference in mixed precision to reduce GPU memory usage.

Training

We used 8 GTX 1080ti GPUs for training. Training logs will be written to the output folder, which can be visualized using tensorboard.

# train on the synthetic scene flow dataset
python tools/train_optical_flow.py --config-file configs/sceneflow_dcvnet.yaml 

# fine-tune it on the MPI-Sintel dataset
# 4 GPUs are sufficient, but here we use 8 GPUs for fast training
python tools/train_optical_flow.py --config-file configs/sintel_dcvnet.yaml --pretrain-weights output/SceneFlow/sceneflow_dcvnet/default/train_epoch_50.pth

# fine-tune it on the KITTI 2012 and 2015 dataset
# we only use 6 GPUs (3 GPUs are sufficient) since the batch size is 6
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python tools/train_optical_flow.py --config-file configs/kitti12+15_dcvnet.yaml --pretrain-weights output/Sintel+SceneFlow/sintel_dcvnet/default/train_epoch_5.pth

Note on the inference speed

In the main branch, the computation of the dilated cost volumes can be further optimized without using the for loop. Checkout the efficient branch for details. If you are interested in testing the inference speed, we suggest to switch to the efficient branch.

git checkout efficient
CUDA_VISIBLE_DEVICES=0 python tools/evaluate_optical_flow.py --dry-run

We haven't fixed this problem because our pre-trained models are based on the implementation in the main branch, which are not compatible with the resizing in the efficient branch. We need to re-train all our models. It will be fixed soon.

To-do

  • Fix the problem of efficient cost volume computation.
  • Train the model on the AutoFlow dataset.

Acknowledgment

Our implementation is built on top of RAFT, Pytorch-Correlation-extension, yacs, Detectron2, and semseg. We thank the authors for releasing and maintaining the code.

Owner
Huaizu Jiang
Assistant Professor at Northeastern University.
Huaizu Jiang
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations This repo contains official code for the NeurIPS 2021 paper Imi

Jiayao Zhang 2 Oct 18, 2021
Based on Stockfish neural network(similar to LcZero)

MarcoEngine Marco Engine - interesnaya neyronnaya shakhmatnaya set', kotoraya ispol'zuyet metod samoobucheniya(dostizheniye khoroshoy igy putem proboy

Marcus Kemaul 4 Mar 12, 2022
A Python script that creates subtitles of a given length from text paragraphs that can be easily imported into any Video Editing software such as FinalCut Pro for further adjustments.

Text to Subtitles - Python This python file creates subtitles of a given length from text paragraphs that can be easily imported into any Video Editin

Dmytro North 9 Dec 24, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing. First, it

356 Dec 23, 2022
Self-supervised learning on Graph Representation Learning (node-level task)

graph_SSL Self-supervised learning on Graph Representation Learning (node-level task) How to run the code To run GRACE, sh run_GRACE.sh To run GCA, sh

Namkyeong Lee 3 Dec 31, 2021
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

2 Dec 28, 2021
Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Introduction Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models". In this work, we demonstrate that existi

Wei-Cheng Tseng 7 Nov 01, 2022
[UNMAINTAINED] Automated machine learning for analytics & production

auto_ml Automated machine learning for production and analytics Installation pip install auto_ml Getting started from auto_ml import Predictor from au

Preston Parry 1.6k Jan 02, 2023
PyQt6 configuration in yaml format providing the most simple script.

PyamlQt(ぴゃむるきゅーと) PyQt6 configuration in yaml format providing the most simple script. Requirements yaml PyQt6, ( PyQt5 ) Installation pip install Pya

Ar-Ray 7 Aug 15, 2022
DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations This repository contains the data, scripts and baseline co

Alexa 51 Dec 17, 2022
Navigating StyleGAN2 w latent space using CLIP

Navigating StyleGAN2 w latent space using CLIP an attempt to build sth with the official SG2-ADA Pytorch impl kinda inspired by Generating Images from

Mike K. 55 Dec 06, 2022
Network Enhancement implementation in pytorch

network_enahncement_pytorch Network Enhancement implementation in pytorch Research paper Network Enhancement: a general method to denoise weighted bio

Yen 1 Nov 12, 2021
[TOG 2021] PyTorch implementation for the paper: SofGAN: A Portrait Image Generator with Dynamic Styling.

This repository contains the official PyTorch implementation for the paper: SofGAN: A Portrait Image Generator with Dynamic Styling. We propose a SofGAN image generator to decouple the latent space o

Anpei Chen 694 Dec 23, 2022
GUI for a Vocal Remover that uses Deep Neural Networks.

GUI for a Vocal Remover that uses Deep Neural Networks.

4.4k Jan 07, 2023
PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

This is a PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR), using subpixel convolution to optimize the inference speed of TecoGAN VSR model. Please refer to the offi

789 Jan 04, 2023
Benchmark for the generalization of 3D machine learning models across different remeshing/samplings of a surface.

Discretization Robust Correspondence Benchmark One challenge of machine learning on 3D surfaces is that there are many different representations/sampl

Nicholas Sharp 10 Sep 30, 2022
DeepGNN is a framework for training machine learning models on large scale graph data.

DeepGNN Overview DeepGNN is a framework for training machine learning models on large scale graph data. DeepGNN contains all the necessary features in

Microsoft 45 Jan 01, 2023
🥈78th place in Riiid Answer Correctness Prediction competition

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

Jungwoo Park 10 Jul 14, 2022
Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

Rbc-traps-generative-design - The generative design for single red clood cell hydrodynamic traps using GEFEST framework

Natural Systems Simulation Lab 4 Jun 16, 2022
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Code for "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval" (ACL 2021, Long) This is the repository for baseline m

Akari Asai 25 Oct 30, 2022