Code for Multimodal Neural SLAM for Interactive Instruction Following

Last update: Dec 07, 2022

Related tags

Overview

Code for Multimodal Neural SLAM for Interactive Instruction Following

Code structure

The code is adapted from E.T. and most training as well as data processing files are in currently in the ET/notebooks folder and the et_train folder.

Dependency

Inherited from the E.T. repo, the package is depending on:

numpy
pandas
opencv-python
tqdm
vocab
revtok
numpy
Pillow
sacred
etaprogress
scikit-video
lmdb
gtimer
filelock
networkx
termcolor
torch==1.7.1
torchvision==0.8.2
tensorboardX==1.8
ai2thor==2.1.0
E.T. (https://github.com/alexpashevich/E.T.)

MaskRCNN Fine-tuning

To fine-tune the MaskRCNN module used in solving the Alfred challenge, we provide the code adapted from the official PyTorch tutorial.

Setup

We assume the environment and the code structure as in the E.T. model is set up, with this repo served as an extension. Although the fine-tuning code should be a standalone unit.

Training Data Geneation

Given a traj_data.json file (e.g., the 45K one used in E.T. joint-training here), run python -m alfred.gen.render_trajs as in E.T. to render the training inputs (raw images) and the ground truth labels (instance segmentation masks) for all the frames recorded in the traj_data.json files. Make sure the flag for generating instance level segmentation masks is set to True.

Pre-processing Instance Segmentation Masks

The rendered instance segmentation masks need to be preprocessed so that the data format is aligned with the one used in the official PyTorch tutorial. In specific, each generated mask is of a different RGB color per instance, which is mapped to the unique instance index in the frame as well as a label index for its semantic class. The mapping is constructed by looking up the traj['scene']['color_to_object_type'] in each of the json dictionaries. The code also supports the functionality to only collect training data from certain subgoal data (such as for PickupObject in Alfred). Notice that there are some bugs in the rendering process of the masks which creates some artifacts (small regions in the ground truth labels that correspond to no actual objects). This can be fixed by only selecting instance masks that are larger than certain area (e.g., > 10 as in alfred/data/maskrcnn.py).

Training

Run python -m alfred.maskrcnn.train which first loads the pre-trained model provided by E.T. and then fine-tunes it on the pre-processed data mentioned above.

Evaluation

We follow the MSCOCO evaluation protocal which is widely used for object detection and instance segmentation, which output average precision and recall at multiple scales. The evaluation function call evaluate(model, data_loader_test, device=device) in alfred/maskrcnn/train.py serves as an example.

Code for Multimodal Neural SLAM for Interactive Instruction Following

Related tags

Overview

Code for Multimodal Neural SLAM for Interactive Instruction Following

Code structure

Dependency

MaskRCNN Fine-tuning

Setup

Training Data Geneation

Pre-processing Instance Segmentation Masks

Training

Evaluation

Owner

This project uses Template Matching technique for object detecting by detection of template image over base image.

Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Kinetics-Data-Preprocessing

The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

Code for "Reconstructing 3D Human Pose by Watching Humans in the Mirror", CVPR 2021 oral

An Unsupervised Detection Framework for Chinese Jargons in the Darknet

GANimation: Anatomically-aware Facial Animation from a Single Image (ECCV'18 Oral) [PyTorch]

AI that generate music

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR 2015 and PAMI 2016.

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

A keras-based real-time model for medical image segmentation (CFPNet-M)

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Fully Automatic Page Turning on Real Scores

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th place solution

A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Code for "On the Effects of Batch and Weight Normalization in Generative Adversarial Networks"

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021)

Voila - Voilà turns Jupyter notebooks into standalone web applications

Pytorch0.4.1 codes for InsightFace

Code for Multimodal Neural SLAM for Interactive Instruction Following

Related tags

Overview

Code for Multimodal Neural SLAM for Interactive Instruction Following

Code structure

Dependency

MaskRCNN Fine-tuning

Setup

Training Data Geneation

Pre-processing Instance Segmentation Masks

Training

Evaluation

Owner

This project uses Template Matching technique for object detecting by detection of template image over base image.

Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Kinetics-Data-Preprocessing

The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

Code for "Reconstructing 3D Human Pose by Watching Humans in the Mirror", CVPR 2021 oral

An Unsupervised Detection Framework for Chinese Jargons in the Darknet

GANimation: Anatomically-aware Facial Animation from a Single Image (ECCV'18 Oral) [PyTorch]

AI that generate music

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

A keras-based real-time model for medical image segmentation (CFPNet-M)

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Fully Automatic Page Turning on Real Scores

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th place solution

A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Code for "On the Effects of Batch and Weight Normalization in Generative Adversarial Networks"

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021)

Voila - Voilà turns Jupyter notebooks into standalone web applications

Pytorch0.4.1 codes for InsightFace

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR 2015 and PAMI 2016.