CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K

Overview

CaFM-pytorch ICCV ACCEPT

Introduction of dataset VSD4K

Our dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city. Each category is consisted of various video length, including: 15s, 30s, 45s, etc. For a specific category and its specific video length, there are 3 scaling factors: x2, x3 and x4. In each file, there are HR images and its corresponding LR images. 1-n are training images , n - (n + n/10) are test images. (we select test image 1 out of 10). The dataset can be obtained from [https://pan.baidu.com/s/14pcsC7taB4VAa3jvyw1kog] (passward:u1qq) and google drive [https://drive.google.com/drive/folders/17fyX-bFc0IUp6LTIfTYU8R5_Ot79WKXC?usp=sharing].

e.g.:game 15s
dataroot_gt: VSD4K/game/game_15s_1/DIV2K_train_HR/00001.png
dataroot_lqx2: VSD4K/game/game_15s_1/DIV2K_train_LR_bicubic/X2/00001_x2.png
dataroot_lqx3: VSD4K/game/game_15s_1/DIV2K_train_LR_bicubic/X3/00001_x3.png
dataroot_lqx4: VSD4K/game/game_15s_1/DIV2K_train_LR_bicubic/X4/00001_x4.png

Proposed method

Introduction

Our paper "Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation" has been submitted to 2021 ICCV. we aim to use super resolution network to improve the quality of video delivery recently. The whole precedure is shown below. We devide the whole video into several chunks and apply a joint training framework with Content aware Feature Module(CaFM) to train each chunk simultaneously. With our method, each video chunk only requires less than 1% of original parameters to be streamed, achieving even better SR performance. We conduct extensive experiments across various SR backbones(espcn,srcnn,vdsr,edsr16,edsr32,rcan), video time length(15s-10min), and scaling factors(x2-x4) to demonstrate the advantages of our method. All pretrain models(15s, 30s, 45s) of game category can be found in this link [https://pan.baidu.com/s/1P18FULL7CIK1FAa2xW56AA] (passward:bjv1) and google drive link [https://drive.google.com/drive/folders/1_N64A75iwgbweDBk7dUUDX0SJffnK5-l?usp=sharing].

Figure 1. The whole procedure of adopting content-aware DNNs for video delivery. A video is first divided into several chunks and the server trains one model for each chunk. Then the server delivers LR video chunks and models to client. The client runs the inference to super-resolve the LR chunks and obtain the SR video.

Quantitative results

We show our quantitative results in the table below. For simplicity, we only demonstrate the results on game and vlog datasets. We compare our method M{1-n} with M0 and S{1-n}. The experiments are conducted on EDSR.

  • M0: a EDSR without CaFM module, train on whole video.
  • Si: a EDSR without a CaFM module, train on one specific chunk i.
  • M{1-n}ours: a EDSR with n CaFM modules, train on n chunks simultaneously.
Dataset Game15s Game30s Game45s
Scale x2 x3 x4 x2 x3 x4 x2 x3 x4
M0 42.24 35.88 33.44 41.84 35.54 33.05 42.11 35.75 33.33
S{1-n} 42.82 36.42 34.00 43.07 36.73 34.17 43.22 36.72 34.32
M{1-n} Ours 43.13 37.04 34.47 43.37 37.12 34.58 43.46 37.31 34.79
Dataset Vlog15s Vlog30s Vlog45s
Scale x2 x3 x4 x2 x3 x4 x2 x3 x4
M0 48.87 44.51 42.58 47.79 43.38 41.24 47.98 43.58 41.53
S{1-n} 49.10 44.80 42.83 48.20 43.68 41.55 48.48 44.12 42.12
M{1-n} Ours 49.30 45.03 43.11 48.55 44.15 42.16 48.61 44.24 42.39

Quatitative results

We show the quatitative results in the figure below.

  • bicubic: SR images are obtained by bicubic
  • H.264/H.265: use the default setting of FFmpeg to generate the H.264 and H.265 videos

Dependencies

  • Python >= 3.6
  • Torch >= 1.0.0
  • opencv-python
  • numpy
  • skimage
  • imageio
  • matplotlib

Quickstart

M0 demotes the model without Cafm module which is trained on the whole dataset. S{1-n} denotes n models that trained on n chunks of video. M{1-n} demotes one model along with n Cafm modules that trained on the whole dataset. M{1-n} is our proposed method.

How to set data_range

n is the total frames in a video. We select one test image out of 10 training images. Thus, in VSD4K, 1-n is its training dataset, n-(n+/10) is the test dataset. Generally, we set 5s as the length of one chunk. Hence, 15s consists 3 chunks, 30s consists 6 chunks, etc.

Video length(train images + test images) chunks M0/M{1-n} S1 S2 S3 S4 S5 S6 S7 S8 S9
15s(450+45) 3 1-450/451-495 1-150/451-465 151-300/466-480 301-450/481-495 - - - - - -
30s(900+95) 6 1-900/901-990 1-150/901-915 151-300/916-930 301-450/931-945 451-600/946-960 601-750/961-975 751-900/976-990 - - -
45s(1350+135) 9 1-1350/1351-1485 1-150/1351-1365 151-300/1366-1380 301-450/1381-1395 451-600/1396-1410 601-750/1411-1425 751-900/1426-1440 901-1050/1441-1455 1051-1200/1456-1470 1201-1350/1471-1485

Train

For simplicity, we only demonstrate how to train 'game_15s' by our method.

  • For M{1-n} model:
CUDA_VISIBLE_DEVICES=3 python main.py --model {EDSR/ESPCN/VDSRR/SRCNN/RCAN} --scale {scale factor} --patch_size {patch size} --save {name of the trained model} --reset --data_train DIV2K --data_test DIV2K --data_range {train_range}/{test_range} --cafm --dir_data {path of data} --use_cafm --batch_size {batch size} --epoch {epoch} --decay {decay} --segnum {numbers of chunk} --length
e.g. 
CUDA_VISIBLE_DEVICES=3 python main.py --model EDSR --scale 2 --patch_size 48 --save trainm1_n --reset --data_train DIV2K --data_test DIV2K --data_range 1-450/451-495 --cafm --dir_data /home/datasets/VSD4K/game/game_15s_1 --use_cafm --batch_size 64 --epoch 500 --decay 300 --segnum 3 --is15s

You can apply our method on your own images. Place your HR images under YOURS/DIV2K_train_HR/, with the name start from 00001.png. Place your corresponding LR images under YOURS/DIV2K_train_LR_bicubic/X2, with the name start from 00001_x2.png.

e.g.:
dataroot_gt: YOURS/DIV2K_train_HR/00001.png
dataroot_lqx2: YOURS/DIV2K_train_LR_bicubic/X2/00001_x2.png
dataroot_lqx3: YOURS/DIV2K_train_LR_bicubic/X3/00001_x3.png
dataroot_lqx4: YOURS/DIV2K_train_LR_bicubic/X4/00001_x4.png
  • The running command is like:
CUDA_VISIBLE_DEVICES=3 python main.py --model {EDSR/ESPCN/VDSRR/SRCNN/RCAN} --scale {scale factor} --patch_size {patch size} --save {name of the trained model} --reset --data_train DIV2K --data_test DIV2K --data_range {train_range}/{test_range} --cafm --dir_data {path of data} --use_cafm --batch_size {batch size} --epoch {epoch} --decay {decay} --segnum {numbers of chunk} --length
  • For example:
e.g. 
CUDA_VISIBLE_DEVICES=3 python main.py --model EDSR --scale 2 --patch_size 48 --save trainm1_n --reset --data_train DIV2K --data_test DIV2K --data_range 1-450/451-495 --cafm --dir_data /home/datasets/VSD4K/game/game_15s_1 --use_cafm --batch_size 64 --epoch 500 --decay 300 --segnum 3 --is15s

Test

For simplicity, we only demonstrate how to run 'game' category of 15s. All pretrain models(15s, 30s, 45s) of game category can be found in this link [https://pan.baidu.com/s/1P18FULL7CIK1FAa2xW56AA] (passward:bjv1) and google drive link [https://drive.google.com/drive/folders/1_N64A75iwgbweDBk7dUUDX0SJffnK5-l?usp=sharing].

  • For M{1-n} model:
CUDA_VISIBLE_DEVICES=3 python main.py --data_test DIV2K --scale {scale factor} --model {EDSR/ESPCN/VDSRR/SRCNN/RCAN} --test_only --pre_train {path to pretrained model} --data_range {train_range} --{is15s/is30s/is45s} --cafm  --dir_data {path of data} --use_cafm --segnum 3
e.g.:
CUDA_VISIBLE_DEVICES=3 python main.py --data_test DIV2K --scale 4 --model EDSR --test_only --pre_train /home/CaFM-pytorch/experiment/edsr_x2_p48_game_15s_1_seg1-3_batch64_k1_g64/model/model_best.pt --data_range 1-150 --is15s --cafm  --dir_data /home/datasets/VSD4K/game/game_15s_1 --use_cafm --segnum 3

Additional

We also demonstrate our method in vimeo dataset and HEVC test sequence. These datasets and all trained models will be released as soon as possible. By the way, we add SEFCNN.py into our backbone list which is suggested by reviewer.The code will be updated regularly.

Acknowledgment

AdaFM proposed a closely related method for continual modulation of restoration levels. While they aimed to handle arbitrary restoration levels between a start and an end level, our goal is to compress the models of different chunks for video delivery. The reader is encouraged to review their work for more details. Please also consider to cite AdaFM if you use the code. [https://github.com/hejingwenhejingwen/AdaFM]

PyMatting: A Python Library for Alpha Matting

Given an input image and a hand-drawn trimap (top row), alpha matting estimates the alpha channel of a foreground object which can then be composed onto a different background (bottom row).

PyMatting 1.4k Dec 30, 2022
Plaything for Autistic Children (demo for PaddlePaddle/Wechaty/Mixlab project)

星星的孩子 - 一款为孤独症孩子设计的聊天机器人游戏 孤独症儿童是目前常常被忽视的一类群体。他们有着类似性格内向的特征,实际却受着广泛性发育障碍的折磨。 项目背景 这类儿童在与人交往时存在着沟通障碍,其特点表现在: 社交交流差,互动障碍明显 认知能力有限,被动认知 兴趣狭窄,重复刻板,缺乏变化和想象

Tianyi Pan 35 Nov 24, 2022
Keras implementation of Deeplab v3+ with pretrained weights

Keras implementation of Deeplabv3+ This repo is not longer maintained. I won't respond to issues but will merge PR DeepLab is a state-of-art deep lear

1.3k Dec 07, 2022
Multimodal commodity image retrieval 多模态商品图像检索

Multimodal commodity image retrieval 多模态商品图像检索 Not finished yet... introduce explain:The specific description of the project and the product image dat

hongjie 8 Nov 25, 2022
"Learning Free Gait Transition for Quadruped Robots vis Phase-Guided Controller"

PhaseGuidedControl The current version is developed based on the old version of RaiSim series, and possibly requires further modification. It will be

X-Mechanics 12 Oct 21, 2022
This repository gives an example on how to preprocess the data of the HECKTOR challenge

HECKTOR 2021 challenge This repository gives an example on how to preprocess the data of the HECKTOR challenge. Any other preprocessing is welcomed an

56 Dec 01, 2022
Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Ibai Gorordo 99 Dec 31, 2022
School of Artificial Intelligence at the Nanjing University (NJU)School of Artificial Intelligence at the Nanjing University (NJU)

F-Principle This is an exercise problem of the digital signal processing (DSP) course at School of Artificial Intelligence at the Nanjing University (

Thyrix 5 Nov 23, 2022
Multi-layer convolutional LSTM with Pytorch

Convolution_LSTM_pytorch Thanks for your attention. I haven't got time to maintain this repo for a long time. I recommend this repo which provides an

Zijie Zhuang 733 Dec 30, 2022
Official implementation for the paper: Multi-label Classification with Partial Annotations using Class-aware Selective Loss

Multi-label Classification with Partial Annotations using Class-aware Selective Loss Paper | Pretrained models Official PyTorch Implementation Emanuel

99 Dec 27, 2022
This repository is the official implementation of the Hybrid Self-Attention NEAT algorithm.

This repository is the official implementation of the Hybrid Self-Attention NEAT algorithm. It contains the code to reproduce the results presented in the original paper: https://arxiv.org/abs/2112.0

Saman Khamesian 6 Dec 13, 2022
Hcpy - Interface with Home Connect appliances in Python

Interface with Home Connect appliances in Python This is a very, very beta inter

Trammell Hudson 116 Dec 27, 2022
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

DSEE Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Ch

VITA 4 Dec 27, 2021
Generative Adversarial Text to Image Synthesis

Text To Image Synthesis This is a tensorflow implementation of synthesizing images. The images are synthesized using the GAN-CLS Algorithm from the pa

Hao 575 Jan 08, 2023
GANimation: Anatomically-aware Facial Animation from a Single Image (ECCV'18 Oral) [PyTorch]

GANimation: Anatomically-aware Facial Animation from a Single Image [Project] [Paper] Official implementation of GANimation. In this work we introduce

Albert Pumarola 1.8k Dec 28, 2022
Dictionary Learning with Uniform Sparse Representations for Anomaly Detection

Dictionary Learning with Uniform Sparse Representations for Anomaly Detection Implementation of the Uniform DL Representation for AD algorithm describ

Paul Irofti 1 Nov 23, 2022
Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

Tom Goldstein 2.2k Jan 09, 2023
Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

ConvNeXt Tensorflow This is unofficial tensorflow keras implementation of ConvNe

29 Oct 06, 2022
Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite.

TFLite-MobileStereoNet Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite. Stereo depth estimati

Ibai Gorordo 4 Feb 14, 2022
[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Single Image Depth Prediction with Wavelet Decomposition Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambeto

Niantic Labs 205 Jan 02, 2023