Hard cater examples from Hopper ICLR paper

Related tags

Deep Learningcater-h
Overview

CATER-h NEC Laboratories America, Inc.

Honglu Zhou*, Asim Kadav, Farley Lai, Alexandru Niculescu-Mizil, Martin Renqiang Min, Mubbasir Kapadia, Hans Peter Graf

(*Contact: [email protected])

CATER-h is the dataset proposed for the Video Reasoning task, specifically, the problem of Object Permanence, investigated in Hopper: Multi-hop Transformer for Spatiotemporal Reasoning accepted to ICLR 2021. Please refer to our full paper for detailed analysis and evaluations.

1. Overview

This repository provides the CATER-h dataset used in the paper "Hopper: Multi-hop Transformer for Spatiotemporal Reasoning", as well as instructions/code to create the CATER-h dataset.

If you find the dataset or the code helpful, please cite:

Honglu Zhou, Asim Kadav, Farley Lai, Alexandru Niculescu-Mizil, Martin Renqiang Min, Mubbasir Kapadia, Hans Peter Graf. Hopper: Multi-hop Transformer for Spatiotemporal Reasoning. In International Conference on Learning Representations (ICLR), 2021.

@inproceedings{zhou2021caterh,
    title = {{Hopper: Multi-hop Transformer for Spatiotemporal Reasoning}},
    author = {Zhou, Honglu and Kadav, Asim and Lai, Farley and Niculescu-Mizil, Alexandru and Min, Martin Renqiang and Kapadia, Mubbasir and Graf, Hans Peter},
    booktitle = {ICLR},
    year = 2021
}  

2. Dataset

A pre-generated sample of the dataset used in the paper is provided here. If you'd like to generate a version of the dataset, please follow instructions in the following.

3. Requirements

  1. All CLEVR requirements (eg, Blender: the code was used with v2.79b).
  2. This code was used on Linux machines.
  3. GPU: This code was tested with multiple types of GPUs and should be compatible with most GPUs. By default it will use all the GPUs on the machine.
  4. All DETR requirements. You can check the site-packages of our conda environment (Python3.7.6) used.

4. Generating CATER-h

4.1 Generating videos and labels

(We modify code provided by CATER.)

  1. cd generate/

  2. echo $PWD >> blender-2.79b-linux-glibc219-x86_64/2.79/python/lib/python3.5/site-packages/clevr.pth (You can download our blender-2.79b-linux-glibc219-x86_64.)

  3. Run time python launch.py to start generating. Please read through the script to change any settings, paths etc. The command line options should also be easy to follow from the script (e.g., --num_images specifies the number of videos to generate).

  4. time python gen_train_test.py to generate labels for the dataset for each of the tasks. Change the parameters on the top of the file, and run it.

4.2 Obtaining frame and object features

You can find our extracted frame and object features here. The CNN backbone we utilized to obtain the frame features is a pre-trained ResNeXt-101 model. We use DETR trained on the LA-CATER dataset to obtain object features.

4.3 Filtering data by the frame index of the last visible snitch

  1. cd extract/

  2. Download our pretrained object detector from here. Create a folder checkpoints. Put the pretrained object detector into the folder checkpoints.

  3. Change paths etc in extract/configs/CATER-h.yml

  4. time ./run.sh

This will generate an output folder with pickle files that save the frame index of the last visible snitch and the detector's confidence.

  1. Run resample.ipynb which will resample the data to have balanced train/val set in terms of the class label and the frame index of the last visible snitch.

Acknowledgments

The code in this repository is heavily based on the following publically available implementations:

Owner
NECLA ML Group
NEC Labs America, Machine Learning Group
NECLA ML Group
Redash reset for python

redash-reset This will use a default REDASH_SECRET_KEY key of c292a0a3aa32397cdb050e233733900f this allows you to reset the password of the user ID bu

Robert Wiggins 5 Nov 14, 2022
So-ViT: Mind Visual Tokens for Vision Transformer

So-ViT: Mind Visual Tokens for Vision Transformer        Introduction This repository contains the source code under PyTorch framework and models trai

Jiangtao Xie 44 Nov 24, 2022
Multistream CNN for Robust Acoustic Modeling

Multistream Convolutional Neural Network (CNN) A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recogni

ASAPP Research 37 Sep 21, 2022
PyTorch implementation of popular datasets and models in remote sensing

PyTorch Remote Sensing (torchrs) (WIP) PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Re

isaac 222 Dec 28, 2022
Official code for "Decoupling Zero-Shot Semantic Segmentation"

Decoupling Zero-Shot Semantic Segmentation This is the official code for the arxiv. ZegFormer is the first framework that decouple the zero-shot seman

Jian Ding 108 Dec 30, 2022
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Pretrained Language Model This repository provides the latest pretrained language models and its related optimization techniques developed by Huawei N

HUAWEI Noah's Ark Lab 2.6k Jan 01, 2023
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

Hugging Face 77.2k Jan 02, 2023
Implements Stacked-RNN in numpy and torch with manual forward and backward functions

Recurrent Neural Networks Implements simple recurrent network and a stacked recurrent network in numpy and torch respectively. Both flavours implement

Vishal R 1 Nov 16, 2021
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch

Memory Efficient Attention This is unofficial implementation of Self-attention Does Not Need O(n^2) Memory for Jax and PyTorch. Implementation is almo

Amin Rezaei 126 Dec 27, 2022
From Perceptron model to Deep Neural Network from scratch in Python.

Neural-Network-Basics Aim of this Repository: From Perceptron model to Deep Neural Network (from scratch) in Python. ** Currently working on a basic N

Aditya Kahol 1 Jan 14, 2022
Implementation of SSMF: Shifting Seasonal Matrix Factorization

SSMF Implementation of SSMF: Shifting Seasonal Matrix Factorization, Koki Kawabata, Siddharth Bhatia, Rui Liu, Mohit Wadhwa, Bryan Hooi. NeurIPS, 2021

Koki Kawabata 9 Jun 10, 2022
[NeurIPS 2021] SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

SSUL - Official Pytorch Implementation (NeurIPS 2021) SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning Sun

Clova AI Research 44 Dec 27, 2022
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

107 Dec 02, 2022
This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset.

DeepLab-ResNet-TensorFlow This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset. Up

19 Jan 16, 2022
Multi-objective constrained optimization for energy applications via tree ensembles

Multi-objective constrained optimization for energy applications via tree ensembles

C⚙G - Imperial College London 1 Nov 19, 2021
Pytorch Implementation of PointNet and PointNet++++

Pytorch Implementation of PointNet and PointNet++ This repo is implementation for PointNet and PointNet++ in pytorch. Update 2021/03/27: (1) Release p

Luigi Ariano 1 Nov 11, 2021
HAT: Hierarchical Aggregation Transformers for Person Re-identification

HAT: Hierarchical Aggregation Transformers for Person Re-identification

11 Sep 05, 2022
Project ArXiv Citation Network

Project ArXiv Citation Network Overview This project involved the analysis of the ArXiv citation network. Usage The complete code of this project is i

Dennis Núñez-Fernández 5 Oct 20, 2022
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

170 Jan 03, 2023
Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis

Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis Requirements python 3.7 pytorch-gpu 1.7 numpy 1.19.4 pytorch_

12 Oct 29, 2022