Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch

Last update: Dec 30, 2022

Overview

Rotary Embeddings - Pytorch

A standalone library for adding rotary embeddings to transformers in Pytorch, following its success as relative positional encoding. Specifically it will make rotating information into any axis of a tensor easy and efficient, whether they be fixed positional or learned. This library will give you state of the art results for positional embedding, at little costs.

My gut also tells me there is something more to rotations that can be exploited in artificial neural networks.

Install

$ pip install rotary-embedding-torch

Usage

import torch
from rotary_embedding_torch import apply_rotary_emb, RotaryEmbedding

# instantiate the positional embedding in your transformer and pass to all your attention layers

pos_emb = RotaryEmbedding(dim = 32)

# generate the rotations

freqs = pos_emb(torch.arange(1024), cache_key = 1024) # cache with a key that is the sequence length, so that it does not need to recompute

# mock queries and keys

q = torch.randn(1, 1024, 64) # queries - (batch, seq len, dimension of head)
k = torch.randn(1, 1024, 64) # keys

# apply the rotations to your queries and keys after the heads have been split out, but prior to the dot product and subsequent softmax (attention)

freqs = freqs[None, ...] # unsqueeze for batch dimension
q = apply_rotary_emb(freqs, q)
k = apply_rotary_emb(freqs, k)

# then do your attention with your queries (q) and keys (k)

If you do all the steps above correctly, you should see a dramatic improvement during training

Axial Rotary Embeddings

For easy use of 2d axial relative positional embedding, ie. vision transformers

import torch
from rotary_embedding_torch import apply_rotary_emb, RotaryEmbedding, broadcat

pos_emb = RotaryEmbedding(
    dim = 32,
    freqs_for = 'pixel'
)

# queries and keys for frequencies to be rotated into

q = torch.randn(1, 256, 256, 64)
k = torch.randn(1, 256, 256, 64)

# get frequencies for each axial
# -1 to 1 has been shown to be a good choice for images and audio

freqs_h = pos_emb(torch.linspace(-1, 1, steps = 256), cache_key = 256)
freqs_w = pos_emb(torch.linspace(-1, 1, steps = 256), cache_key = 256)

# concat the frequencies along each axial
# broadcat function makes this easy without a bunch of expands

freqs = broadcat((freqs_h[None, :, None, :], freqs_w[None, None, :, :]), dim = -1)

# rotate in frequencies

q = apply_rotary_emb(freqs, q)
k = apply_rotary_emb(freqs, k)

Learned Rotations

For injecting learned rotations into a network. Experiments pending

Update: doesn't seem to do anything -_-, will keep trying...

import torch
from torch import nn
from rotary_embedding_torch import apply_learned_rotations

x = torch.randn(1, 1024, 512)

# you can only rotate in (dim // 2) values
# ex. for 512, you can only rotate in 256 values

# say you have two sets of learned rotations of 128 values each

rots1 = nn.Linear(512, 128)(x)
rots2 = nn.Linear(512, 128)(x)

# you rotate in 256 (128 x 2) at first

x = apply_learned_rotations(rots1, x, start_index = 0)

# then you start at index 256 and rotate in the last (128 x 2)

x = apply_learned_rotations(rots2, x, start_index = 256)

# you could also concat the rotations together and pass it in all at once

rots = torch.cat((rots1, rots2), dim = -1)

x = apply_learned_rotations(rots, x)

Citations

@misc{su2021roformer,
    title   = {RoFormer: Enhanced Transformer with Rotary Position Embedding}, 
    author  = {Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu},
    year    = {2021},
    eprint  = {2104.09864},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

You might also like...

Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

DEFT: Detection Embeddings for Tracking DEFT: Detection Embeddings for Tracking, Mohamed Chaabane, Peter Zhang, J. Ross Beveridge, Stephen O'Hara

253 Dec 18, 2022

Learning embeddings for classification, retrieval and ranking.

StarSpace StarSpace is a general-purpose neural model for efficient learning of entity embeddings for solving a wide variety of problems: Learning wor

3.8k Dec 22, 2022

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Unseen Object Clustering: Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation Introduction In this work, we propose a new method

132 Dec 13, 2022

Improving XGBoost survival analysis with embeddings and debiased estimators

xgbse: XGBoost Survival Embeddings "There are two cultures in the use of statistical modeling to reach conclusions from data

242 Dec 30, 2022

State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

88 Dec 30, 2022

Reliable probability face embeddings

ProbFace, arxiv This is a demo code of training and testing [ProbFace] using Tensorflow. ProbFace is a reliable Probabilistic Face Embeddging (PFE) me

34 Dec 31, 2022

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus General info This is

71 Oct 25, 2022

🤖 A Python library for learning and evaluating knowledge graph embeddings

PyKEEN PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-m

1.1k Jan 9, 2023

Large scale embeddings on a single machine.

Marius Marius is a system under active development for training embeddings for large-scale graphs on a single machine. Training on large scale graphs

107 Jan 3, 2023

Comments

Custom position offset when rotating queries or keys

This library seems to assume that queries and keys are left-aligned position-wise e.g.

q = [p_0, p_1, p_2]
k = [p_0, p_1, p_2, p_3, p_4]

where p_i are corresponding positions. This is enforced by starting the sequence of positions always from 0 with torch.arange(seq_len) here. Applications like Perceiver AR, however, require a position-wise right-alignment e.g.

q =           [p_2, p_3, p_4]
k = [p_0, p_1, p_2, p_3, p_4]

This pull requests allows to specify a start position for queries and or keys to enable alignments other than left-alignments. For example

import torch
from rotary_embedding_torch.rotary_embedding_torch import RotaryEmbedding

rot = RotaryEmbedding(dim=32)

q = torch.ones(1, 8, 4, 32)
k = torch.ones(1, 8, 6, 32)

q = q / torch.norm(q, dim=-1, keepdim=True)
k = k / torch.norm(k, dim=-1, keepdim=True)

q_rot = rot.rotate_queries_or_keys(q, start_pos=k.shape[2] - q.shape[2])
k_rot = rot.rotate_queries_or_keys(k)

attn = torch.einsum("b h i c, b h j c -> b h i j", q_rot, k_rot)
print(attn[0, 0])

prints the following relative position embedding

tensor([[0.8581, 0.9571, 1.0000, 0.9571, 0.8581, 0.7670],
        [0.7670, 0.8581, 0.9571, 1.0000, 0.9571, 0.8581],
        [0.7288, 0.7670, 0.8581, 0.9571, 1.0000, 0.9571],
        [0.7361, 0.7288, 0.7670, 0.8581, 0.9571, 1.0000]])

(diagonal of 1s right-aligned) whereas the default behavior

...

q_rot = rot.rotate_queries_or_keys(q)
k_rot = rot.rotate_queries_or_keys(k)

attn = torch.einsum("b h i c, b h j c -> b h i j", q_rot, k_rot)
print(attn[0, 0])

would print

tensor([[1.0000, 0.9571, 0.8581, 0.7670, 0.7288, 0.7361],
        [0.9571, 1.0000, 0.9571, 0.8581, 0.7670, 0.7288],
        [0.8581, 0.9571, 1.0000, 0.9571, 0.8581, 0.7670],
        [0.7670, 0.8581, 0.9571, 1.0000, 0.9571, 0.8581]])

(diagonal of 1s left-aligned).

opened by krasserm 1

about axial rotary embeddings

Hi, Thank you for sharing this code with us. However, I was confused with the axial rotary embeddings in rotary_embedding_torch.py file. " elif freqs_for == 'pixel': freqs = torch.linspace(1., max_freq / 2, dim // 2) * pi " Where does this formula come from？What parameter is max_freqs?Why the freqs is not " 1/(10000^(2i/d))"？

Thank you again.

opened by raindrop313 0

Releases(0.2.1)

0.2.1(Dec 22, 2022)

null
Source code(tar.gz)
Source code(zip)
0.2.0(Dec 22, 2022)

null
Source code(tar.gz)
Source code(zip)
0.1.5(Mar 28, 2022)

Source code(tar.gz)
Source code(zip)
0.1.4(Mar 3, 2022)

Source code(tar.gz)
Source code(zip)
0.1.2(Nov 24, 2021)

Source code(tar.gz)
Source code(zip)
0.1.1(Oct 5, 2021)

Source code(tar.gz)
Source code(zip)
0.1.0(Aug 16, 2021)

Source code(tar.gz)
Source code(zip)
0.0.12(Aug 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.11(Aug 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.10(Jul 15, 2021)

Source code(tar.gz)
Source code(zip)
0.0.9(Jul 15, 2021)

Source code(tar.gz)
Source code(zip)
0.0.8(Jul 15, 2021)

Source code(tar.gz)
Source code(zip)
0.0.6(Jul 15, 2021)

Source code(tar.gz)
Source code(zip)
0.0.5(Jul 8, 2021)

Source code(tar.gz)
Source code(zip)
0.0.4(Jul 8, 2021)

Source code(tar.gz)
Source code(zip)
0.0.3(Jul 8, 2021)

Source code(tar.gz)
Source code(zip)
0.0.2(Jul 8, 2021)

Source code(tar.gz)
Source code(zip)
0.0.1(Jul 8, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention

GitHub Repository

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

52 Dec 29, 2022

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

LDNet Author: Wen-Chin Huang (Nagoya University) Email: Wen-Chin Huang (unilight) 40 Nov 20, 2022

Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.

Tensorflow-Mobile-Generic-Object-Localizer Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label. Ori

11 Nov 15, 2022

This is the repo of the manuscript "Dual-branch Attention-In-Attention Transformer for speech enhancement"

DB-AIAT: A Dual-branch attention-in-attention transformer for single-channel SE

68 Dec 16, 2022

HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep.

HODEmu HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep. and emulates satellite abundance as a function of co

1 Oct 13, 2021

Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

1.4k Jan 08, 2023

Imagededup - 😎 Finding duplicate images made easy

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

4.3k Jan 07, 2023

Reproduces ResNet-V3 with pytorch

ResNeXt.pytorch Reproduces ResNet-V3 (Aggregated Residual Transformations for Deep Neural Networks) with pytorch. Tried on pytorch 1.6 Trains on Cifar

481 Dec 23, 2022

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph ConvNets in PyTorch October 15, 2017 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbresson

287 Jan 04, 2023

《Truly shift-invariant convolutional neural networks》(2021)

Truly shift-invariant convolutional neural networks [Paper] Authors: Anadi Chaman and Ivan Dokmanić Convolutional neural networks were always assumed

46 Dec 19, 2022

Moment-DETR code and QVHighlights dataset

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

133 Dec 22, 2022

Detector for Log4Shell exploitation attempts

log4shell-detector Detector for Log4Shell exploitation attempts Idea The problem with the log4j CVE-2021-44228 exploitation is that the string can be

729 Dec 25, 2022

Bling's Object detection tool

BriVL for Building Applications This repo is used for illustrating how to build applications by using BriVL model. This repo is re-implemented from fo

47 Nov 01, 2022

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

117 Dec 28, 2022

BiSeNet based on pytorch

BiSeNet BiSeNet based on pytorch 0.4.1 and python 3.6 Dataset Download CamVid dataset from Google Drive or Baidu Yun(6xw4). Pretrained model Download

367 Dec 26, 2022

Train Dense Passage Retriever (DPR) with a single GPU

Gradient Cached Dense Passage Retrieval Gradient Cached Dense Passage Retrieval (GC-DPR) - is an extension of the original DPR library. We introduce G

92 Jan 02, 2023

Training, generation, and analysis code for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics

Location-Aware Generative Adversarial Networks (LAGAN) for Physics Synthesis This repository contains all the code used in L. de Oliveira (@lukedeo),

57 Oct 22, 2022

PSPNet in Chainer

PSPNet This is an unofficial implementation of Pyramid Scene Parsing Network (PSPNet) in Chainer. Training Requirement Python 3.4.4+ Chainer 3.0.0b1+

76 Dec 12, 2022

GARCH and Multivariate LSTM forecasting models for Bitcoin realized volatility with potential applications in crypto options trading, hedging, portfolio management, and risk management

Bitcoin Realized Volatility Forecasting with GARCH and Multivariate LSTM Author: Chi Bui This Repository Repository Directory ├── README.md

113 Dec 29, 2022

NasirKhusraw - The TSP solved using genetic algorithm and show TSP path overlaid on a map of the Iran provinces & their capitals.

Nasir Khusraw : Travelling Salesman Problem The TSP solved using genetic algorithm. This project show TSP path overlaid on a map of the Iran provinces

2 Sep 01, 2022

Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch

Related tags

Overview

Rotary Embeddings - Pytorch

Install

Usage

Axial Rotary Embeddings

Learned Rotations

Citations

You might also like...

Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

Learning embeddings for classification, retrieval and ranking.

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Improving XGBoost survival analysis with embeddings and debiased estimators

State of the art Semantic Sentence Embeddings

Reliable probability face embeddings

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

🤖 A Python library for learning and evaluating knowledge graph embeddings

Large scale embeddings on a single machine.

Comments

Custom position offset when rotating queries or keys

about axial rotary embeddings

Releases(0.2.1)

0.2.1(Dec 22, 2022)

0.2.0(Dec 22, 2022)

0.1.5(Mar 28, 2022)

0.1.4(Mar 3, 2022)

0.1.2(Nov 24, 2021)

0.1.1(Oct 5, 2021)

0.1.0(Aug 16, 2021)

0.0.12(Aug 6, 2021)

0.0.11(Aug 6, 2021)

0.0.10(Jul 15, 2021)

0.0.9(Jul 15, 2021)

0.0.8(Jul 15, 2021)

0.0.6(Jul 15, 2021)

0.0.5(Jul 8, 2021)

0.0.4(Jul 8, 2021)

0.0.3(Jul 8, 2021)

0.0.2(Jul 8, 2021)

0.0.1(Jul 8, 2021)