Training RNNs as Fast as CNNs

Last update: Dec 12, 2022

Related tags

Text Data & NLP sru

Overview

News

SRU++, a new SRU variant, is released. [tech report] [blog]

The experimental code and SRU++ implementation are available on the dev branch which will be merged into master later.

About

SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.

Average processing time of LSTM, conv2d and SRU, tested on GTX 1070

For example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d.

Reference:

Simple Recurrent Units for Highly Parallelizable Recurrence [paper]

@inproceedings{lei2018sru,
  title={Simple Recurrent Units for Highly Parallelizable Recurrence},
  author={Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
  year={2018}
}

When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute [paper]

@article{lei2021srupp,
  title={When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute},
  author={Tao Lei},
  journal={arXiv preprint arXiv:2102.12459},
  year={2021}
}

Requirements

PyTorch >=1.6 recommended
ninja

Install requirements via pip install -r requirements.txt.

Installation

From source:

SRU can be installed as a regular package via python setup.py install or pip install ..

From PyPi:

pip install sru

Directly use the source without installation:

Make sure this repo and CUDA library can be found by the system, e.g.

export PYTHONPATH=path_to_repo/sru
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Examples

The usage of SRU is similar to nn.LSTM. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).

import torch
from sru import SRU, SRUCell

# input has length 20, batch size 32 and dimension 128
x = torch.FloatTensor(20, 32, 128).cuda()

input_size, hidden_size = 128, 128

rnn = SRU(input_size, hidden_size,
    num_layers = 2,          # number of stacking RNN layers
    dropout = 0.0,           # dropout applied between RNN layers
    bidirectional = False,   # bidirectional RNN
    layer_norm = False,      # apply layer normalization on the output of each layer
    highway_bias = -2,        # initial bias of highway gate (<= 0)
)
rnn.cuda()

output_states, c_states = rnn(x)      # forward pass

# output_states is (length, batch size, number of directions * hidden size)
# c_states is (layers, batch size, number of directions * hidden size)

Contributing

Please read and follow the guidelines.

Other Implementations

@musyoku had a very nice SRU implementaion in chainer.

@adrianbg implemented the first CPU version.

Training RNNs as Fast as CNNs

Related tags

Overview

News

About

Reference:

Requirements

Installation

From source:

From PyPi:

Directly use the source without installation:

Examples

Contributing

Other Implementations

Owner

Tao Lei

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

A repo for materials relating to the tutorial of CS-332 NLP

PyWorld3 is a Python implementation of the World3 model

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

This is an incredibly powerful calculator that is capable of many useful day-to-day functions.

This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

CoSENT 比Sentence-BERT更有效的句向量方案

SpikeX - SpaCy Pipes for Knowledge Extraction

A simple version of DeTR

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Easy to start. Use deep nerual network to predict the sentiment of movie review.

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.