COIN the currently largest dataset for comprehensive instruction video analysis.

Last update: Dec 28, 2022

Related tags

Overview

COIN Dataset

COIN is the currently largest dataset for comprehensive instruction video analysis. It contains 11,827 videos of 180 different tasks (i.e., car polishing, make French fries) related to 12 domains (i.e., vehicle, dish). All videos are collected from YouTube and annotated with an efficient toolbox.

Authors and Contributors

Yansong Tang^*, Dajun Ding^†, Yongming Rao^*, Yu Zheng^*, Danyang Zhang^*, Lili Zhao^†, Jiwen Lu^*, Jie Zhou^*, Yongxiang Lian^*, Yao Li^†, Jiali Sun^†, Chang Liu^†, Dongge You^†, Zirun Yang^†, Jiaojiao Ge^†, Jiayun Wang^*

^*Tsinghua University
^†Meitu Inc.

Contact: [email protected]

License

You may use the codes and files for research only, including sharing and modifying the material. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Dataset and Annotation

Taxonomy

The COIN is organized in a hierarchical structure, which contains three levels: domain, task and step. The corresponding relationship can be found at taxonomy [link]. We provide the taxonomy file of COIN in csv format. Below, we show a small part of the texonomy stored in taxonomy.xlsx:

domain_target_mapping

target_action_mapping

Domains	Targets
...	...
Vehicle	ChangeCarTire
Vehicle	InstallLicensePlateFrame
...	...
Gadgets	ReplaceCDDriveWithSSD

Target Id	Target Label	Action Id	Action Label
...	...	...	...
13	ChangeCarTire	259	unscrew the screw
13	ChangeCarTire	260	jack up the car
13	ChangeCarTire	261	remove the tire
13	ChangeCarTire	262	put on the tire
13	ChangeCarTire	263	tighten the screws
...	...	...	...

We store the url of video and their annotation in JSON format, which can be accessed with the link [COIN](Project link page). The json file is similar to that of ActivityNet. Below, we show an example entry from the key field "database":

"LtRSn-ntcLY": {
			"duration": 131.0309,
			"class": "ReplaceCDDriveWithSSD",
			"video_url": "https://www.youtube.com/embed/LtRSn-ntcLY",
			"start": 56.640895694775196,
			"annotation": [
				{
					"id": "212",
					"segment": [
						60.0,
						69.0
					],
					"label": "take out the laptop CD drive"
				},
				{
					"id": "216",
					"segment": [
						71.0,
						82.0
					],
					"label": "insert the hard disk tray into the position of the CD drive"
				}
			],
			"subset": "training",
			"end": 85.714362947023,
			"recipe_type": 131
		}

From the entry, we can easily retrieve the Youtube ID, duration, ROI and procedure information of the video. The field "annotation" comprises of a list of all annotated procedures within the video. The field "class" and sub-field "id" correspond to "task" and "step" of the taxonomy respectively.

File Structure

The annotation information is saved in COIN.json.

Field Name	Type	Example	Description
`database`	string	-	Key filed of the annotation file.
-	string	`LtRSn-ntcLY`	Youtube ID of the video.
`duration`	float	56.640895694775196	Duration of the video in seconds.
`class`	string	`ReplaceCDDriveWithSSD`	Name of the task in the video.
`video_url`	string	`https://www.youtube.com/embed/LtRSn-ntcLY`	Url of the video.
`start`	float	56.640895694775196	Start time of the ROI of the video.
`end`	float	85.714362947023	End time of the ROI of the video.
`subset`	string	`training` or `validation`	Subset of the video.
`recipe_type`	int	131	ID number of the task.
`annotation`	string	-	Annotation information of the video.
`annotation`:`id`	int	212	ID number of the procedure.
`annotation`:`label`	string	`take out the laptop CD drive`	Name of the procedure.
`annotation`:`segment`	list of float (len=2)	`[60.0,69.0]`	Start and end time of the procedure.

COIN the currently largest dataset for comprehensive instruction video analysis.

Related tags

Overview

COIN Dataset

Authors and Contributors

License

Dataset and Annotation

Taxonomy

File Structure

Owner

Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection

This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Official repository for the ICCV 2021 paper: UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model.

Pytorch implementation for ACMMM2021 paper "I2V-GAN: Unpaired Infrared-to-Visible Video Translation".

Pretrained Cost Model for Distributed Constraint Optimization Problems

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation

Dark Finix: All in one hacking framework with almost 100 tools

An example of time series augmentation methods with Keras

Simple-Neural-Network From Scratch in Python

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Official Pytorch implementation of paper "Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images"

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

A collection of inference modules for fastai2

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

Keras Image Embeddings using Contrastive Loss

Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

Finetune SSL models for MOS prediction

Vector Neurons: A General Framework for SO(3)-Equivariant Networks

COIN the currently largest dataset for comprehensive instruction video analysis.

Related tags

Overview

COIN Dataset

Authors and Contributors

License

Dataset and Annotation

Taxonomy

File Structure

Owner

Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection

This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Official repository for the ICCV 2021 paper: UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model.

Pytorch implementation for ACMMM2021 paper "I2V-GAN: Unpaired Infrared-to-Visible Video Translation".

Pretrained Cost Model for Distributed Constraint Optimization Problems

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation

Dark Finix: All in one hacking framework with almost 100 tools

An example of time series augmentation methods with Keras

Simple-Neural-Network From Scratch in Python

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Official Pytorch implementation of paper "Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images"

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

A collection of inference modules for fastai2

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

Keras Image Embeddings using Contrastive Loss

Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

Finetune SSL models for MOS prediction

Vector Neurons: A General Framework for SO(3)-Equivariant Networks

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.