Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)

Overview

Discovering Non-monotonic Autoregressive Orderings with Variational Inference

Description

This package contains the source code implementation of the paper "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper).

Inferring good generation orders in natural sequences is challenging. In our main contribution, we propose Variational Order Inference (VOI), which can be efficiently trained to discover autoregressive sequence generation orders in a data driven way without a domain-specific prior.

In VOI, the encoder permutation generator generates non-monotonic autoregressive orders as the latent variable, and the decoder autoregressive (language) model maximizes the joint probability of generating the target sequence under these non-monotonic orders. In conditional text generation tasks, the encoder is implemented as Transformer with non-causal attention, and the decoder is implemented as Transformer-InDIGO (Gu et al., 2019) which generates target sequences through insertion.

Installation

To install this package, first download the package from github, then install it using pip. For CUDA 10.1 (as configured in setup.py), the package versions are Tensorflow 2.3 and PyTorch 1.5, with their corresponding tensorflow_probability and torchvision versions. For CUDA 11.0, you may need to change the package versions in setup.py to be tensorflow==2.4, torch==1.6, tensorflow_probability==0.12.1, and torchvision==0.7.0.

git clone https://github.com/xuanlinli17/autoregressive_inference
cd autoregressive_inference
pip install -e .

Install helper packages for word tokenization and part of speech tagging. Enter the following statements into the python interpreter where you have installed our package.

import nltk
nltk.download('punkt')
nltk.download('brown')
nltk.download('universal_tagset')

Install nlg-eval that contains several helpful metrics for evaluating image captioning. Tasks other than captioning are evaluated through the vizseq package we already installed through setup.py.

pip install git+https://github.com/Maluuba/[email protected]
nlg-eval --setup

Clone wmt16-scripts for machine translation preprocessing.

git clone https://github.com/rsennrich/wmt16-scripts

Configure tensorflow-hungarian

During training, one process of order inference is to obtain permutation matrices from doubly stochastic matrices. This is accomplished through the Hungarian algorithm. Since tf.py_function only allows one gpu to run the function at any time, multi-gpu training is very slow if we use scipy.optimize.linear_sum_assignment (which requires wrapping it with tf.py_function to call). Therefore, we use a pre-written Hungarian-op script and compile it through g++ into dynamic library. During runtime, we can import the dynamic library using tensorflow api. This leads to much faster distributed training.

git clone https://github.com/brandontrabucco/tensorflow-hungarian
cd tensorflow-hungarian
make hungarian_op

If you encounter fatal error: third_party/gpus/cuda/include/cuda_fp16.h: No such file or directory, this could be resolved via link. The generated op could be found in tensorflow-hungarian/tensorflow_hungarian/python/ops/_hungarian_ops.so

Alternatively, we could also generate the op from the repo munkres-tensorflow.

git clone https://github.com/mbaradad/munkres-tensorflow
TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared munkres-tensorflow/hungarian.cc -o hungarian.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2

However, this function requires all entries in a matrix to be different (otherwise some strange behaviors will occur), so we also need to uncomment the line sample_permu = sample_permu * 1000 + tf.random.normal(tf.shape(sample_permu)) * 1e-7 in voi/nn/layers/permutation_sinkhorn.py

Setup

Captioning

In this section, we will walk you through how to create a training dataset, using COCO 2017 as an example. In the first step, download COCO 2017 here. Place the extracted .json annotations at ~/annotations and the images at ~/train2017 and ~/val2017 for the training and validation set respectively.

Create a part of speech tagger first. This information is used to visualize the generation orders of captions learnt by our model, and is not used during training.

cd {this_repo}
python scripts/data/create_tagger.py --out_tagger_file tagger.pkl

Extract COCO 2017 into a format compatible with our package. There are several arguments that you can specify to control how the dataset is processed. You may leave all arguments as default except out_caption_folder and annotations_file.

python scripts/data/extract_coco.py --out_caption_folder ~/captions_train2017 --annotations_file ~/annotations/captions_train2017.json
python scripts/data/extract_coco.py --out_caption_folder ~/captions_val2017 --annotations_file ~/annotations/captions_val2017.json

Process the COCO 2017 captions and extract integer features on which to train a non sequential model. There are again several arguments that you can specify to control how the captions are processed. You may leave all arguments as default except out_feature_folder and in_folder, which depend on where you extracted the COCO dataset in the previous step. Note that if vocab_file doesn't exist before, it will be automatically generated. Since we have provided the train2017_vocab.txt we used to train our model, this vocab file will be directly loaded to create integer representations of tokens.

python scripts/data/process_captions.py --out_feature_folder ~/captions_train2017_features --in_folder ~/captions_train2017 \
--tagger_file tagger.pkl --vocab_file train2017_vocab.txt --min_word_frequency 5 --max_length 100
python scripts/data/process_captions.py --out_feature_folder ~/captions_val2017_features --in_folder ~/captions_val2017 \
--tagger_file tagger.pkl --vocab_file train2017_vocab.txt --max_length 100

Process images from the COCO 2017 dataset and extract features using a pretrained Faster RCNN FPN backbone from pytorch checkpoint. Note this script will distribute inference across all visible GPUs on your system. There are several arguments you can specify, which you may leave as default except out_feature_folder and in_folder, which depend on where you extracted the COCO dataset.

python scripts/data/process_images.py --out_feature_folder ~/train2017_features --in_folder ~/train2017 --batch_size 4
python scripts/data/process_images.py --out_feature_folder ~/val2017_features --in_folder ~/val2017 --batch_size 4

Finally, convert the processed features into a TFRecord format for efficient training. Record where you have extracted the COCO dataset in the previous steps and specify out_tfrecord_folder, caption_folder and image_folder at the minimum.

python scripts/data/create_tfrecords_captioning.py --out_tfrecord_folder ~/train2017_tfrecords \
--caption_folder ~/captions_train2017_features --image_folder ~/train2017_features --samples_per_shard 4096
python scripts/data/create_tfrecords_captioning.py --out_tfrecord_folder ~/val2017_tfrecords \
--caption_folder ~/captions_val2017_features --image_folder ~/val2017_features --samples_per_shard 4096

Django

For convenience, we ran the script from NL2code to extract the cleaned dataset from drive and place them in django_data. The vocab file djangovocab.txt is also in that directory. Alternatively, you may download raw data from ase15-django and run python scripts/data/extract_django.py --data_dir {path to all.anno and all.code)

Next, process the Django dataset into TFRecord format for efficient training.

cd {this_repo}

CUDA_VISIBLE_DEVICES=0 python scripts/data/process_django.py --data_folder ./django_data \
--vocab_file ./django_data/djangovocab.txt --dataset_type train/dev/test \
--out_feature_folder ./django_data

CUDA_VISIBLE_DEVICES=0 python scripts/data/create_tfrecords_django.py --out_tfrecord_folder ./django_data \
--dataset_type train/dev/test --feature_folder ./django_data

Gigaword

First, extract the dataset and learn byte-pair encoding.

cd {this_repo}
CUDA_VISIBLE_DEVICES=0 python scripts/data/extract_gigaword.py --data_dir {dataroot}
cd {dataroot}/gigaword
subword-nmt learn-joint-bpe-and-vocab --input src_raw_train.txt tgt_raw_train.txt -s 32000 -o joint_bpe.code --write-vocabulary src_vocab.txt tgt_vocab.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary src_vocab.txt --vocabulary-threshold 50 < src_raw_train.txt > src_train.BPE.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary src_vocab.txt --vocabulary-threshold 50 < src_raw_validation.txt > src_validation.BPE.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary src_vocab.txt --vocabulary-threshold 50 < src_raw_test.txt > src_test.BPE.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary tgt_vocab.txt --vocabulary-threshold 50 < tgt_raw_train.txt > tgt_train.BPE.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary tgt_vocab.txt --vocabulary-threshold 50 < tgt_raw_validation.txt > tgt_validation.BPE.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary tgt_vocab.txt --vocabulary-threshold 50 < tgt_raw_test.txt > tgt_test.BPE.txt

Then, generate the vocab file, and use this vocab file to convert tokens into integers and store in a feature file. Alternately you may use the gigaword_vocab.txt provided in our repo, which we used to train our model. To do this, set the following --vocab_file argument to be {this_repo}/gigaword_vocab.txt.

cd {this_repo}
CUDA_VISIBLE_DEVICES=0 python scripts/data/process_gigaword.py --out_feature_folder {dataroot}/gigaword \
--data_folder {dataroot}/gigaword --vocab_file {dataroot}/gigaword/gigaword_vocab.txt (or {this_repo}/gigaword_vocab.txt) \
--dataset_type train/validation/test

Finally, generate the train/validation/test tfrecords files.

CUDA_VISIBLE_DEVICES=0 python scripts/data/create_tfrecords_gigaword.py --out_tfrecord_folder {dataroot}/gigaword \
--feature_folder {dataroot}/gigaword --samples_per_shard 4096 --dataset_type train/validation/test

WMT

Here, we use WMT16 Ro-En as an example.

First extract the dataset and learn byte-pair encoding.

cd {this_repo}
CUDA_VISIBLE_DEVICES=0 python scripts/data/extract_wmt.py --language_pair 16 ro en --data_dir {dataroot}
cd {dataroot}/wmt16_translate/ro-en
subword-nmt learn-joint-bpe-and-vocab --input src_raw_train.txt tgt_raw_train.txt -s 32000 -o joint_bpe.code --write-vocabulary src_vocab.txt tgt_vocab.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary src_vocab.txt --vocabulary-threshold 50 < src_raw_train.txt > src_train.BPE.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary src_vocab.txt --vocabulary-threshold 50 < src_raw_validation.txt > src_validation.BPE.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary src_vocab.txt --vocabulary-threshold 50 < src_raw_test.txt > src_test.BPE.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary tgt_vocab.txt --vocabulary-threshold 50 < tgt_raw_train.txt > tgt_train.BPE.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary tgt_vocab.txt --vocabulary-threshold 50 < tgt_raw_validation.txt > tgt_validation.BPE.txt
subword-nmt apply-bpe -c joint_bpe.code --vocabulary tgt_vocab.txt --vocabulary-threshold 50 < tgt_raw_test.txt > tgt_test.BPE.txt

Extract corpus with truecase to train the truecaser, which is used for evaluation.

git clone https://github.com/moses-smt/mosesdecoder
cd {this_repo}
CUDA_VISIBLE_DEVICES=0 python scripts/data/extract_wmt.py --language_pair 16 ro en --data_dir {dataroot} --truecase
{path_to_mosesdecoder}/scripts/recaser/train-truecaser.perl -corpus {dataroot}/wmt16_translate/ro-en/src_truecase_train.txt -model {dataroot}/wmt16_translate/ro-en/truecase-model.ro
{path_to_mosesdecoder}/scripts/recaser/train-truecaser.perl -corpus {dataroot}/wmt16_translate/ro-en/tgt_truecase_train.txt -model {dataroot}/wmt16_translate/ro-en/truecase-model.en

Remove the diacritics of Romanian:

git clone https://github.com/rsennrich/wmt16-scripts
cd {dataroot}/wmt16_translate/ro-en/
python {path_to_wmt16-scripts}/preprocess/remove-diacritics.py < src_train.BPE.txt > src_train.BPE.txt
python {path_to_wmt16-scripts}/preprocess/remove-diacritics.py < src_validation.BPE.txt > src_validation.BPE.txt
python {path_to_wmt16-scripts}/preprocess/remove-diacritics.py < src_test.BPE.txt > src_test.BPE.txt

------------Note-------------

In practice, training with the sequence-level distillation dataset (Link) generated using the L2R model with beam size 5 leads to about 2 BLEU improvement on WMT16 Ro-En, intuitively because the target sequences in this new dataset are more consistent. We release the this distilled dataset here. To use this dataset, put src_distillation.BPE.txt and tgt_distillation.BPE.txt in {dataroot}/wmt16_translate/ro-en/. Training on this distilled dataset obtains very similar ordering observations (i.e. the model generates all descriptive tokens before generating the auxillary tokens) compared to training on the original dataset.

-----------------------------

Generate the vocab file (joint vocab for the source and target languages), and use this vocab file to convert tokens into integers and store in a feature file. Since we forgot to remove the diacritics during our initial experiments and we appended all missing vocabs in the diacritics-removed corpus afterwards, the vocab file we used to train our model is slightly different from the one generated through the scripts below, so we have uploaded the vocab file we used to train our model as ro_en_vocab.txt. To use this vocab file, set the following --vocab_file argument to be {this_repo}/ro_en_vocab.txt

cd {this_repo}
CUDA_VISIBLE_DEVICES=0 python scripts/data/process_wmt.py --out_feature_folder {dataroot}/wmt16_translate/ro-en \
--data_folder {dataroot}/wmt16_translate/ro-en --vocab_file {dataroot}/wmt16_translate/ro_en_vocab.txt (or {this_repo}/ro_en_vocab.txt) \
--dataset_type distillation/train/validation/test

Finally, generate the distillation/train/validation/test tfrecords files.

CUDA_VISIBLE_DEVICES=0 python scripts/data/create_tfrecords_wmt.py --out_tfrecord_folder {dataroot}/wmt16_translate/ro-en \
--feature_folder {dataroot}/wmt16_translate/ro-en --samples_per_shard 4096 --dataset_type distillation/train/validation/test

Training

Please see training_scripts.md for details about training a model.

Validation, Test, and Visualization

Please see evaluation_visualization_scripts.md for details about validating / testing a model, along with visualizing the generalization orders of a model.

Pretrained Models

We have provided pretrained models for each task here. You may make a directory ckpt_pretrain under this repo and download them under this directory.

To evaluate the pretrained models and visualize their generalization orders, please see eval_visualize_pretrained_models.md for details.

Citations

@inproceedings{li2021autoregressiveinference,
  title={Discovering Non-monotonic Autoregressive Orderings with Variational Inference},
  author={Li, Xuanlin and Trabucco, Brandon and Park, Dong Huk and Luo, Michael and Shen, Sheng and Darrell, Trevor and Gao, Yang},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=jP1vTH3inC}
}
Owner
Xuanlin (Simon) Li
Researcher in Artificial Intelligence and Machine Learning | PhD student @haosulab at UCSD | Alumni of Berkeley AI
Xuanlin (Simon) Li
시각 장애인을 위한 스마트 지팡이에 활용될 딥러닝 모델 (DL Model Repo)

SmartCane-DL-Model Smart Cane using semantic segmentation 참고한 Github repositoy 🔗 https://github.com/JunHyeok96/Road-Segmentation.git 데이터셋 🔗 https://

반드시 졸업한다 (Team Just Graduate) 4 Dec 03, 2021
Open AI's Python library

OpenAI Python Library The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. It incl

Pavan Ananth Sharma 3 Jul 10, 2022
Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomaly Detection

Why, hello there! This is the supporting notebook for the research paper — Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomal

2 Dec 14, 2021
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps Here is the code for ssbassline model. We also provide OCR results/features/mode

ZephyrZhuQi 51 Nov 18, 2022
A curated list of awesome Deep Learning tutorials, projects and communities.

Awesome Deep Learning Table of Contents Books Courses Videos and Lectures Papers Tutorials Researchers Websites Datasets Conferences Frameworks Tools

Christos 20k Jan 05, 2023
hipCaffe: the HIP port of Caffe

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Cent

ROCm Software Platform 126 Dec 05, 2022
Repositório da disciplina de APC, no segundo semestre de 2021

NOTAS FINAIS: https://github.com/fabiommendes/apc2018/blob/master/nota-final.pdf Algoritmos e Programação de Computadores Este é o Git da disciplina A

16 Dec 16, 2022
Using pytorch to implement unet network for liver image segmentation.

Using pytorch to implement unet network for liver image segmentation.

zxq 1 Dec 17, 2021
Data and extra materials for the food safety publications classifier

Data and extra materials for the food safety publications classifier The subdirectories contain detailed descriptions of their contents in the README.

1 Jan 20, 2022
StyleGAN2-ADA-training-jupyter - Training custom datasets in styleGAN2-ADA by NVIDIA using Jupyter

styleGAN2-ADA-training-jupyter Training custom datasets in styleGAN2-ADA on Jupyter Official StyleGAN2-ADA by NIVIDIA Paper Training Generative Advers

Mang Su Hyun 2 Feb 24, 2022
A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️

hf-hub-lightning A callback for pushing lightning models to the Hugging Face Hub. Note: I made this package for myself, mostly...if folks seem to be i

Nathan Raw 27 Dec 14, 2022
An updated version of virtual model making

Model-Swap-Face v2   这个项目是基于stylegan2 pSp制作的,比v1版本Model-Swap-Face在推理速度和图像质量上有一定提升。主要的功能是将虚拟模特进行环球不同区域的风格转换,目前转换器提供西欧模特、东亚模特和北非模特三种主流的风格样式,可帮我们实现生产资料零成

seeprettyface.com 62 Dec 09, 2022
Mengzi Pretrained Models

中文 | English Mengzi 尽管预训练语言模型在 NLP 的各个领域里得到了广泛的应用,但是其高昂的时间和算力成本依然是一个亟需解决的问题。这要求我们在一定的算力约束下,研发出各项指标更优的模型。 我们的目标不是追求更大的模型规模,而是轻量级但更强大,同时对部署和工业落地更友好的模型。

Langboat 424 Jan 04, 2023
Keras like implementation of Deep Learning architectures from scratch using numpy.

Mini-Keras Keras like implementation of Deep Learning architectures from scratch using numpy. How to contribute? The project contains implementations

MANU S PILLAI 5 Oct 10, 2021
Code for testing various M1 Chip benchmarks with TensorFlow.

M1, M1 Pro, M1 Max Machine Learning Speed Test Comparison This repo contains some sample code to benchmark the new M1 MacBooks (M1 Pro and M1 Max) aga

Daniel Bourke 348 Jan 04, 2023
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,

2 Sep 20, 2022
Pseudo lidar - (CVPR 2019) Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving This paper has been accpeted by Conference o

Yan Wang 881 Dec 27, 2022
Create Data & AI apps in 20 lines of code with Shimoku

Install with: pip install shimoku-api-python Start with: from os import getenv import shimoku_api_python.client as Shimoku

Shimoku 5 Nov 07, 2022
Pretraining on Dynamic Graph Neural Networks

Pretraining on Dynamic Graph Neural Networks Our article is PT-DGNN and the code is modified based on GPT-GNN Requirements python 3.6 Ubuntu 18.04.5 L

7 Dec 17, 2022
Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

MLCR This is the source code for paper Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition. Xuesong Niu, Hu Han, Shiguang

Edson-Niu 60 Nov 29, 2022