COIN the currently largest dataset for comprehensive instruction video analysis.

Overview

COIN Dataset

COIN is the currently largest dataset for comprehensive instruction video analysis. It contains 11,827 videos of 180 different tasks (i.e., car polishing, make French fries) related to 12 domains (i.e., vehicle, dish). All videos are collected from YouTube and annotated with an efficient toolbox.

Authors and Contributors

Yansong Tang*, Dajun Ding, Yongming Rao*, Yu Zheng*, Danyang Zhang*, Lili Zhao, Jiwen Lu*, Jie Zhou*, Yongxiang Lian*, Yao Li, Jiali Sun, Chang Liu, Dongge You, Zirun Yang, Jiaojiao Ge, Jiayun Wang*

  • *Tsinghua University
  • Meitu Inc.

Contact: [email protected]

License

You may use the codes and files for research only, including sharing and modifying the material. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Dataset and Annotation

Taxonomy

The COIN is organized in a hierarchical structure, which contains three levels: domain, task and step. The corresponding relationship can be found at taxonomy [link]. We provide the taxonomy file of COIN in csv format. Below, we show a small part of the texonomy stored in taxonomy.xlsx:

domain_target_mapping target_action_mapping
Domains Targets
... ...
Vehicle ChangeCarTire
Vehicle InstallLicensePlateFrame
... ...
Gadgets ReplaceCDDriveWithSSD
Target Id Target Label Action Id Action Label
... ... ... ...
13 ChangeCarTire 259 unscrew the screw
13 ChangeCarTire 260 jack up the car
13 ChangeCarTire 261 remove the tire
13 ChangeCarTire 262 put on the tire
13 ChangeCarTire 263 tighten the screws
... ... ... ...

We store the url of video and their annotation in JSON format, which can be accessed with the link [COIN](Project link page). The json file is similar to that of ActivityNet. Below, we show an example entry from the key field "database":

"LtRSn-ntcLY": {
			"duration": 131.0309,
			"class": "ReplaceCDDriveWithSSD",
			"video_url": "https://www.youtube.com/embed/LtRSn-ntcLY",
			"start": 56.640895694775196,
			"annotation": [
				{
					"id": "212",
					"segment": [
						60.0,
						69.0
					],
					"label": "take out the laptop CD drive"
				},
				{
					"id": "216",
					"segment": [
						71.0,
						82.0
					],
					"label": "insert the hard disk tray into the position of the CD drive"
				}
			],
			"subset": "training",
			"end": 85.714362947023,
			"recipe_type": 131
		}

From the entry, we can easily retrieve the Youtube ID, duration, ROI and procedure information of the video. The field "annotation" comprises of a list of all annotated procedures within the video. The field "class" and sub-field "id" correspond to "task" and "step" of the taxonomy respectively.

File Structure

The annotation information is saved in COIN.json.

Field Name Type Example Description
database string - Key filed of the annotation file.
- string LtRSn-ntcLY Youtube ID of the video.
duration float 56.640895694775196 Duration of the video in seconds.
class string ReplaceCDDriveWithSSD Name of the task in the video.
video_url string https://www.youtube.com/embed/LtRSn-ntcLY Url of the video.
start float 56.640895694775196 Start time of the ROI of the video.
end float 85.714362947023 End time of the ROI of the video.
subset string training or validation Subset of the video.
recipe_type int 131 ID number of the task.
annotation string - Annotation information of the video.
annotation:id int 212 ID number of the procedure.
annotation:label string take out the laptop CD drive Name of the procedure.
annotation:segment list of float (len=2) [60.0,69.0] Start and end time of the procedure.
Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection

CP-Cluster Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection, Instance Segme

Yichun Shen 41 Dec 08, 2022
This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

TransMix: Attend to Mix for Vision Transformers This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transf

Jie-Neng Chen 130 Jan 01, 2023
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 97 Dec 23, 2022
Official repository for the ICCV 2021 paper: UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model.

UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model Official repository for the ICCV 2021 paper: UltraPose: Syn

MomoAILab 92 Dec 21, 2022
Pytorch implementation for ACMMM2021 paper "I2V-GAN: Unpaired Infrared-to-Visible Video Translation".

I2V-GAN This repository is the official Pytorch implementation for ACMMM2021 paper "I2V-GAN: Unpaired Infrared-to-Visible Video Translation". Traffic

69 Dec 31, 2022
Pretrained Cost Model for Distributed Constraint Optimization Problems

Pretrained Cost Model for Distributed Constraint Optimization Problems Requirements PyTorch 1.9.0 PyTorch Geometric 1.7.1 Directory structure baseline

2 Aug 28, 2022
Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

Cheng Zhang 149 Jan 08, 2023
Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation

UniFuse (RAL+ICRA2021) Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation, arXiv, Demo Preparation I

Alibaba 47 Dec 26, 2022
Dark Finix: All in one hacking framework with almost 100 tools

Dark Finix - Hacking Framework. Dark Finix is a all in one hacking framework wit

Md. Nur habib 2 Feb 18, 2022
An example of time series augmentation methods with Keras

Time Series Augmentation This is a collection of time series data augmentation methods and an example use using Keras. News 2020/04/16: Repository Cre

九州大学 ヒューマンインタフェース研究室 229 Jan 02, 2023
Simple-Neural-Network From Scratch in Python

Simple-Neural-Network From Scratch in Python This is a simple Neural Network created without any Machine Learning Libraries. The only dependencies are

Aum Shah 1 Dec 28, 2021
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. Training python train.py --c

Rishikesh (ऋषिकेश) 55 Dec 26, 2022
Official Pytorch implementation of paper "Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images"

Reverse_Engineering_GMs Official Pytorch implementation of paper "Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Gener

100 Dec 18, 2022
​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

Microsoft 983 Dec 23, 2022
A collection of inference modules for fastai2

fastinference A collection of inference modules for fastai including inference speedup and interpretability Install pip install fastinference There ar

Zachary Mueller 83 Oct 10, 2022
Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows WACV 2022 preprint:https://arxiv.org/abs/2107.1

Denis 156 Dec 28, 2022
Keras Image Embeddings using Contrastive Loss

Keras-Image-Embeddings-using-Contrastive-Loss Image to Embedding projection in vector space. Implementation in keras and tensorflow for custom data. B

Shravan Anand K 5 Mar 21, 2022
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
Finetune SSL models for MOS prediction

Finetune SSL models for MOS prediction This is code for our paper under review for ICASSP 2022: "Generalization Ability of MOS Prediction Networks" Er

Yamagishi and Echizen Laboratories, National Institute of Informatics 32 Nov 22, 2022
Vector Neurons: A General Framework for SO(3)-Equivariant Networks

Vector Neurons: A General Framework for SO(3)-Equivariant Networks Created by Congyue Deng, Or Litany, Yueqi Duan, Adrien Poulenard, Andrea Tagliasacc

Congyue Deng 332 Dec 29, 2022