Generic Event Boundary Detection: A Benchmark for Event Segmentation

Related tags

Deep LearningGEBD
Overview

Generic Event Boundary Detection: A Benchmark for Event Segmentation

We release our data annotation & baseline codes for detecting generic event boundaries in video.

Links: [Arxiv] [LOVEU Challenge]

Contributors: Mike Zheng Shou, Stan Lei, Deepti Ghadiyaram, Weiyao Wang, Matt Feiszli.

Overview

This repo has the following structure:

./
│   LICENSE
│   README.md
│   INSTRUCTIONS.md
│
├───BdyDet
│   ├───k400
│   │       detect_event_boundary.py
│   │       run_multiprocess_detect_event_boundary.py
│   │
│   └───TAPOS
│           detect_event_boundary.py
│           run_multiprocess_detect_event_boundary.py
│
├───Challenge_eval_Code
│       eval.py
│       README.md
│
├───data
│   ├───export
│   │       prepare_hmdb_release.ipynb
│   │       prepare_k400_release.ipynb
│   │
│   ├───exp_k400
│   │   │   classInd.txt
│   │   │   val_set.csv
│   │   │
│   │   ├───detect_seg
│   │   └───pred_err
│   └───exp_TAPOS
│       │   train_set.csv
│       │   val_set.csv
│       │
│       ├───detect_seg
│       └───pred_err
├───eval
│       eval_GEBD_k400.ipynb
│       eval_GEBD_TAPOS.ipynb
│
├───PA_DPC
│   │   LICENSE
│   │   README.md
│   │
│   ├───asset
│   │       arch.png
│   │
│   ├───backbone
│   │       convrnn.py
│   │       resnet_2d.py
│   │       resnet_2d3d.py
│   │       select_backbone.py
│   │
│   ├───dpc
│   │       dataset_3d_infer_pred_error.py
│   │       main_infer_pred_error.py
│   │       model_3d.py
│   │
│   └───utils
│           augmentation.py
│           utils.py
│
└───PC
    │   PC_test.py
    │   PC_train.py
    │   README.md
    │
    ├───DataAssets
    ├───datasets
    │       augmentation.py
    │       MultiFDataset.py
    │
    ├───modeling
    │       resnetGEBD.py
    │
    ├───run
    │       pc_k400_dist.sh
    │       pc_tapos_dist.sh
    │
    └───utils
            augmentation.py
            checkpoint_saver.py
            augmentation.py
            checkpoint_saver.py
            clip_grad.py
            cuda.py
            getter.py
            helper.py
            log.py
            metric.py
            model_ema.py
            optim_factory.py
            sampler.py
            scheduler.py

Note that we release codes on Github. Annotations are available on GoogleDrive. Run the code by yourself to generate the output files. Refer to INSTRUCTIONS for preparing data and generating submission files.

  • data/:

    • export/ folder stores temporal boundary annotations of our Kinetics-GEBD and HMDB-GEBD datasets; download our raw annotations and put them under this folder.
    • exp_k400/ and exp_TAPOS/ store intermediate experimental data and final results.
  • Challenge_eval_Code/: codes for evaluation in LOVEU Challenge Track 1.

  • BdyDet/: codes for detecting boundary positions based on predictability sequence.

  • eval/: codes for evaluating the performance of boundary detection.

  • PA_DPC/: codes for computing the predictability sequence using various methods.

  • PC/: codes for supervised baseline on GEBD.

data/

  • In data/export/:

    • *_raw_annotation.pkl stores the raw annotations; download raw annotations here.
    • we further filter out videos that receives <3 annotations and conduct pre-processing e.g. merge very close boundaries - we use notebook prepare_*_release.ipynb and the output is stored in *_mr345_min_change_duration0.3.pkl.
  • Some fields in *_raw_annotation.pkl:

    • fps: frames per second.
    • video_duration:video duration in second.
    • f1_consis: a list of consistency scores, each score corresponds to a specific annotator’s score as compared to other annotators.
    • substages_timestamps: a list of annotations from each annotator; each annotator’s annotation is again a list of boundaries. time in second; for 'label', it is of format A: B.
      • If A is “ShotChangeGradualRange”, B could be “Change due to Cut”, “Change from/to slow motion”, “Change from/to fast motion”, or “N/A”.
      • If A is “ShotChangeImmediateTimestamp”, B could be “Change due to Pan”, “Change due to Zoom”, “Change due to Fade/Dissolve/Gradual”, “Multiple”, or “N/A”.
      • If A is “EventChange (Timestamp/Range)”, B could be “Change of Color”, “Change of Actor/Subject”, “Change of Object Being Interacted”, “Change of Action”, “Multiple”, or “N/A”.
  • Some fields in *_mr345_min_change_duration0.3.pkl:

    • substages_myframeidx: the term of substage is following the convention in TAPOS to refer to boundary position -- here we store each boundary’s frame index which starts at 0.
  • In data/exp_k400/ and data/exp_TAPOS/:

    • pred_err/ stores output of PA_DPC/ i.e. the predictability sequence;
    • detect_seg/ stores output of BdyDet/ i.e. detected boundary positions.

Challenge_eval_Code/

  • We use eval.py for evaluation in our competition.

  • Although one can use frame_index and number of total frames to measure the Rel.Dis, as we implemented in eval/, you should represent the detected boundaries with timestamps (in seconds). For example:

    {
    ‘6Tz5xfnFl4c’: [5.9, 9.4], # boundaries detected at 5.9s, 9.4s of this video
    ‘zJki61RMxcg’: [0.6, 1.5, 2.7] # boundaries detected at 0.6s, 1.5s, 2.7s of this video
    ...
    }
  • Refer to this file to generate GT files from raw annotations.

BdyDet/

Change to directory ./BdyDet/k400 or ./BdyDet/TAPOS and run the following command, which will launch multiple processes of detect_event_boundary.py.

(Note to set the number of processes according to your server in detect_event_boundary.py.)

python run_multiprocess_detect_event_boundary.py

PA_DPC/

Our implementation is based on the [DPC] framework. Please refer to their README or website to learn installation and usage. In the below, we only explain how to run our scripts and what are our modifications.

  • Modifications at a glance

    • main_infer_pred_error.py runs in only inference mode to assess predictability over time.
    • dataset_3d_infer_pred_error.py contains loaders for two datasets i.e. videos from Kinetics and truncated instances from TAPOS.
    • model_3d.py adds several classes for feature extraction purposes, e.g. ResNet feature before the pooling layer, to enable computing feature difference directly based on ImageNet pretrained model, in contrast to the predictive model in DPC.
  • How to run?

    change to directory ./PA_DPC/dpc/

    • Kinetics-GEBD:
    python main_infer_pred_error.py --gpu 0 --model resnet50-beforepool --num_seq 2 --pred_step 1 --seq_len 5 --dataset k400 --batch_size 160 --img_dim 224 --mode val --ds 3 --pred_task featdiff_rgb
    • TAPOS:
    python main_infer_pred_error.py --gpu 0 --model resnet50-beforepool --num_seq 2 --pred_step 1 --seq_len 5 --dataset tapos_instances --batch_size 160 --img_dim 224 --mode val --ds 3 --pred_task featdiff_rgb
    • Notes:

      • ds=3 means we downsample the video frames by 3 to reduce computation.

      • seq_len=5 means 5 frames for each step (concept used in DPC).

      • num_seq=2and pred_step=1 so that at a certain time position t, we use 1 step before and 1 step after to assess the predictability at time t; thus the predictability at time t is the feature difference between the average feature of 5 sampled frames before t and the average feature of 5 sampled frames after t.

      • we slide such model over time to obtain predictability at different temporal positions. More details can be found in dataset_3d_infer_pred_error.py. Note that window_lists.pkl stores the index of frames to be sampled for every window - since it takes a long time to compute (should be able to optimize in the future), we store its value and just load this pre-computed value in the future runs on the same dataset during experimental explorations.

PC/

PC is a supervised baseline for GEBD task. In the PC framework, for each frame f in a video, we take T frames preceding f and T frames succeeding f as inputs, and then build a binary classifier to predict if f is boundary or background.

  • Get Started

    • Check PC/datasets/MultiFDataset.py to generate GT files for training. Note that you should prepare k400_mr345_*SPLIT*_min_change_duration0.3.pkl for Kinetics-GEBD and TAPOS_*SPLIT*_anno.pkl (this should be organized as k400_mr345_*SPLIT*_min_change_duration0.3.pkl for convenience) for TAPOS before running our code.
    • You should accordingly change PATH_TO in our codes to your data/frames path as needed.
  • How to run?

    Change directory to ./PC.

    • Train on Kinetics-GEBD:

      CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 PC_train.py \
      --dataset kinetics_multiframes \
      --train-split train \
      --val-split val \
      --num-classes 2 \
      --batch-size 32 \
      --n-sample-classes 2 \
      --n-samples 16 \
      --lr 0.01 \
      --warmup-epochs 0 \
      --epochs 30 \
      --decay-epochs 10 \
      --model multiframes_resnet \
      --pin-memory \
      --balance-batch \
      --sync-bn \
      --amp \
      --native-amp \
      --eval-metric loss \
      --log-interval 50
    • Train on TAPOS:

      CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 PC_train.py \
      --dataset tapos_multiframes \
      --train-split train \
      --val-split val \
      --num-classes 2 \
      --batch-size 32 \
      --n-sample-classes 2 \
      --n-samples 16 \
      --lr 0.01 \
      --warmup-epochs 0 \
      --epochs 30 \
      --decay-epochs 10 \
      --model multiframes_resnet \
      --pin-memory \
      --balance-batch \
      --sync-bn \
      --amp \
      --native-amp \
      --eval-metric loss \
      --log-interval 50 
    • Generate scores sequence on Kinetics-GEBD Validation Set:

      CUDA_VISIBLE_DEVICES=0 python PC_test.py \ 
      --dataset kinetics_multiframes \
      --val-split val \
      --resume path_to/checkpoint
    • Generate scores sequence on TAPOS:

      CUDA_VISIBLE_DEVICES=0 python PC_test.py \ 
      --dataset tapos_multiframes \
      --val-split val \
      --resume path_to/checkpoint
  • Models

Misc

  • Download datasets from Kinetics-400 and TAPOS. (Note that some of the videos can not be downloaded from YouTube for some reason, you can go ahead with those available.)

  • Note that for TAPOS, you need to cut out each action instance by yourself first and then can use our following codes to process each instance's video separately.

  • Extract frames of videos in the dataset.

  • To reproduce PA_DPC, generate your own data/exp_*/val_set.csv, in which path to the folder of video frames and the number of frames in that specific video should be contained.

Q&A

For any questions, welcome to create an issue or email Mike ([email protected]) and Stan ([email protected]). Thank you for helping us improve our data & codes.

MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモ

Tokyo2020-Pictogram-using-MediaPipe MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモです。 Tokyo2020Pictgram02.mp4 Requirement mediapipe 0.8.6 or later O

KazuhitoTakahashi 295 Dec 26, 2022
Background Matting: The World is Your Green Screen

Background Matting: The World is Your Green Screen By Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steve Seitz, and Ira Kemelmacher-Shlizerman Th

Soumyadip Sengupta 4.6k Jan 04, 2023
Code repository for Semantic Terrain Classification for Off-Road Autonomous Driving

BEVNet Datasets Datasets should be put inside data/. For example, data/semantic_kitti_4class_100x100. Training BEVNet-S Example: cd experiments bash t

(Brian) JoonHo Lee 24 Dec 12, 2022
Implementation of "Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency"

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency (ICCV2021) Paper Link: https://arxiv.org/abs/2107.11355 This implementation bui

32 Nov 17, 2022
Omniverse sample scripts - A guide for developing with Python scripts on NVIDIA Ominverse

Omniverse sample scripts ここでは、NVIDIA Omniverse ( https://www.nvidia.com/ja-jp/om

ft-lab (Yutaka Yoshisaka) 37 Nov 17, 2022
Universal Probability Distributions with Optimal Transport and Convex Optimization

Sylvester normalizing flows for variational inference Pytorch implementation of Sylvester normalizing flows, based on our paper: Sylvester normalizing

Rianne van den Berg 172 Dec 13, 2022
A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks) This repository contains a PyTorch implementation for the paper: Deep Pyra

Greg Dongyoon Han 262 Jan 03, 2023
This project provides a stock market environment using OpenGym with Deep Q-learning and Policy Gradient.

Stock Trading Market OpenAI Gym Environment with Deep Reinforcement Learning using Keras Overview This project provides a general environment for stoc

Kim, Ki Hyun 769 Dec 25, 2022
ROS support for Velodyne 3D LIDARs

Overview Velodyne1 is a collection of ROS2 packages supporting Velodyne high definition 3D LIDARs3. Warning: The master branch normally contains code

ROS device drivers 543 Dec 30, 2022
MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Introduction This is the source code of our TCSVT 2021 paper "MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval". Ple

7 Aug 24, 2022
A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

3d-pose-baseline This is the code for the paper Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little. A simple yet effective baseline for 3

Julieta Martinez 1.3k Jan 03, 2023
EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation

EFENet EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation Code is a bit messy now. I woud clean up soon. For training the EF

Yaping Zhao 19 Nov 05, 2022
Hierarchical Motion Encoder-Decoder Network for Trajectory Forecasting (HMNet)

Hierarchical Motion Encoder-Decoder Network for Trajectory Forecasting (HMNet) Our paper: https://arxiv.org/abs/2111.13324 We will release the complet

15 Oct 17, 2022
Materials for my scikit-learn tutorial

Scikit-learn Tutorial Jake VanderPlas email: [email protected] twitter: @jakevdp gith

Jake Vanderplas 1.6k Dec 30, 2022
An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020

UnpairedSR An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020 turn RCAN(modified) -- xmodel(xilinx

JiaKui Hu 10 Oct 28, 2022
ruptures: change point detection in Python

Welcome to ruptures ruptures is a Python library for off-line change point detection. This package provides methods for the analysis and segmentation

Charles T. 1.1k Jan 03, 2023
Complete U-net Implementation with keras

U Net Lowered with Keras Complete U-net Implementation with keras Original Paper Link : https://arxiv.org/abs/1505.04597 Special Implementations : The

Sagnik Roy 14 Oct 10, 2022
Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation Introduction This is a PyTorch

XMed-Lab 30 Sep 23, 2022
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 06, 2022
NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs.

NAS-HPO-Bench-II API Overview NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs. It helps a fair and low-

yoichi hirose 8 Nov 21, 2022