Implementation of a Transformer, but completely in Triton

Last update: Dec 22, 2022

Overview

Transformer in Triton (wip)

Implementation of a Transformer, but completely in Triton. I'm completely new to lower-level neural net code, so this repository will mostly be a learning experience, with the end-goal being a vanilla transformer that is faster and more efficient to train.

Install

$ pip install triton-transformer

Usage

import torch
from triton_transformer import Transformer

model = Transformer(
    num_tokens = 256,
    max_seq_len = 1024,
    dim = 512,
    depth = 6,
    heads = 8,
    dim_head = 64
)

x = torch.randint(0, 256, (1, 1024))
mask = torch.ones(1, 1024).bool()

logits = model(x, mask = mask) # (1, 1024, 256)

Citations

@article{Tillet2019TritonAI,
    title   = {Triton: an intermediate language and compiler for tiled neural network computations},
    author  = {Philippe Tillet and H. Kung and D. Cox},
    journal = {Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages},
    year    = {2019}
}

@misc{vaswani2017attention,
    title   = {Attention Is All You Need}, 
    author  = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
    year    = {2017},
    eprint  = {1706.03762},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

120 Dec 12, 2022

A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

515 Dec 26, 2022

A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

115 Dec 9, 2021

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Protein GLM (wip) Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capabil

17 May 6, 2022

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 1, 2023

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

409 Jan 6, 2023

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

610 Dec 28, 2022

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022

Comments

Question concerning PyTorch build

Hello. I find your project very interesting and I have seen your comparison between PyTorch and Triton implementations.

However, I am curious whether your PyTorch environment is a source build optimized for your machine or a pip/conda install.

Source building has faster runtimes and if a conda install is being used for comparison, the difference in speed may simply be due to Triton optimizing CUDA for the run environment.

Thank you again for your interesting project.

opened by veritas9872 13
_layernorm implementation forward result not equal F.layer_norm

I have a try on your triton-transformer and test the layernorm module alone. It's very weird that the forward result is different while the backward result is equal.

code: from triton_transformer.layernorm import layernorm import torch import torch.nn as nn

torch.manual_seed(0) x = torch.randn(2,5).cuda() x.requires_grad_(True) dy = .1*torch.randn_like(x).cuda() dim = 5 norm = nn.LayerNorm(dim).cuda()

y1 = layernorm(x, norm.weight, norm.bias, use_triton = True) y2 = layernorm(x, norm.weight, norm.bias, use_triton = False) print(y1, y2) print(torch.allclose(y1, y2))

y1.backward(dy, retain_graph=True) dx_y1 = x.grad.clone()

x.grad = None

y2.backward(dy, retain_graph=True) dx_y2 = x.grad.clone() print(dx_y1, dx_y2) print(torch.allclose(dx_y1, dx_y2))

result: `tensor([[ 0.9492, -0.0021, -0.9797, 0.4449, -0.4123], [-0.7624, 0.4399, 0.7299, -0.3091, -0.0983]], device='cuda:0', grad_fn=<_layernormBackward>) tensor([[ 1.4217, -0.0031, -1.4674, 0.6663, -0.6175], [-1.4342, 0.8276, 1.3732, -0.5815, -0.1850]], device='cuda:0', grad_fn=) False

tensor([[-0.0706, 0.0288, -0.0813, 0.0446, 0.0785], [ 0.0218, -0.0152, 0.0141, -0.0522, 0.0315]], device='cuda:0') tensor([[-0.0706, 0.0288, -0.0813, 0.0446, 0.0785], [ 0.0218, -0.0152, 0.0141, -0.0522, 0.0315]], device='cuda:0') True`

opened by Tengxu-Sun 1
Current state of benchmarking & contributing?
Hey @lucidrains - hope you're doing well! I have some time to hack the next couple weeks, just wanted to get a sense of:

Current state of benchmarking (what Triton kernels provide how much lift, aggregate lift over a "vanilla Transformer implementation"

If there's anything I could help with, especially as I learn Triton!
opened by siddk 0
Official layer norm added

Hi @lucidrains , in Triton layer norm was just added in examples, https://github.com/openai/triton/commit/d4baad426db72b83c5222e1c83c929c1860cae54 I tested it, it's twice as fast as Torch, often faster then Apex.

I'm looking forward for your implementation of attention, so far the Torch implementation is the fastest with 12.3 / 14.5 (forw / back) vs the other Triton implementation in DeepSpeed which is 17.3/ 23.0 on my data.

opened by olegklimov 2

Releases(0.1.1)

0.1.1(Apr 5, 2022)

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 4, 2022)

Source code(tar.gz)
Source code(zip)
0.0.28(Mar 23, 2022)

Source code(tar.gz)
Source code(zip)
0.0.27(Nov 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.26(Nov 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.25(Oct 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.24(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.23(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.22(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.21(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.20(Sep 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.19(Sep 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.18(Sep 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.17(Sep 28, 2021)

Source code(tar.gz)
Source code(zip)
0.0.16(Sep 28, 2021)

Source code(tar.gz)
Source code(zip)
0.0.15(Sep 27, 2021)

Source code(tar.gz)
Source code(zip)
0.0.14(Sep 23, 2021)

Source code(tar.gz)
Source code(zip)
0.0.12(Sep 23, 2021)

Source code(tar.gz)
Source code(zip)
0.0.10(Sep 23, 2021)

Source code(tar.gz)
Source code(zip)
0.0.9(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.8(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.7(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.6(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.5(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.4(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.3(Sep 15, 2021)

Source code(tar.gz)
Source code(zip)
0.0.2(Sep 15, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need

GitHub Repository

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

11 Jan 01, 2023

Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at [email protected]

TableParser Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at DS3 Lab 11 Dec 13, 2022

TensorRT examples (Jetson, Python/C++)(object detection)

53 Dec 22, 2022

Fast Neural Style for Image Style Transform by Pytorch

FastNeuralStyle by Pytorch Fast Neural Style for Image Style Transform by Pytorch This is famous Fast Neural Style of Paper Perceptual Losses for Real

81 Sep 03, 2022

A library for optimization on Riemannian manifolds

TensorFlow RiemOpt A library for manifold-constrained optimization in TensorFlow. Installation To install the latest development version from GitHub:

83 Dec 27, 2022

Pytorch implementation of "A simple neural network module for relational reasoning" (Relational Networks)

Pytorch implementation of Relational Networks - A simple neural network module for relational reasoning Implemented & tested on Sort-of-CLEVR task. So

800 Dec 05, 2022

Activity tragle - Google is tracking everything, we just look at it

activity_tragle Google is tracking everything, we just look at it here. You need

1 Feb 15, 2022

Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation

DynaBOA Code repositoty for the paper: Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation Shanyan Guan, Jingwei Xu, Michell

197 Jan 07, 2023

Multiband spectro-radiometric satellite image analysis with K-means cluster algorithm

Multi-band Spectro Radiomertric Image Analysis with K-means Cluster Algorithm Overview Multi-band Spectro Radiomertric images are images comprising of

6 Mar 16, 2022

Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

GD-Thief Red Team tool for exfiltrating files from a target's Google Drive that you(the attacker) has access to, via the Google Drive API. This includ

39 Dec 27, 2022

A simple Rock-Paper-Scissors game using CV in python

ML18_Rock-Paper-Scissors-using-CV A simple Rock-Paper-Scissors game using CV in python For IITISOC-21 Rules and procedure to play the interactive game

3 Aug 08, 2021

Supervised Contrastive Learning for Product Matching

Contrastive Product Matching This repository contains the code and data download links to reproduce the experiments of the paper "Supervised Contrasti

18 Dec 10, 2022

Space Time Recurrent Memory Network - Pytorch

Space Time Recurrent Memory Network - Pytorch (wip) Implementation of Space Time Recurrent Memory Network, recurrent network competitive with attentio

50 Nov 07, 2021

A spatial genome aligner for analyzing multiplexed DNA-FISH imaging data.

jie jie is a spatial genome aligner. This package parses true chromatin imaging signal from noise by aligning signals to a reference DNA polymer model

9 Sep 29, 2022

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

90 Dec 29, 2022

PyTorch implementation of UPFlow (unsupervised optical flow learning)

UPFlow: Upsampling Pyramid for Unsupervised Optical Flow Learning By Kunming Luo, Chuan Wang, Shuaicheng Liu, Haoqiang Fan, Jue Wang, Jian Sun Megvii

87 Dec 20, 2022

Torch code for our CVPR 2018 paper "Residual Dense Network for Image Super-Resolution" (Spotlight)

Residual Dense Network for Image Super-Resolution This repository is for RDN introduced in the following paper Yulun Zhang, Yapeng Tian, Yu Kong, Bine

494 Dec 30, 2022

Code for the bachelors-thesis flaky fault localization

Flaky_Fault_Localization Scripts for the Bachelors-Thesis: "Flaky Fault Localization" by Christian Kasberger. The thesis examines the usefulness of sp

1 Oct 26, 2021

PyTorch implementation HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

HoroPCA This code is the official PyTorch implementation of the ICML 2021 paper: HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projec

52 Nov 14, 2022

Winners of the Facebook Image Similarity Challenge

111 Jan 05, 2023

Implementation of a Transformer, but completely in Triton

Related tags

Overview

Transformer in Triton (wip)

Install

Usage

Citations

You might also like...

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

A concise but complete implementation of CLIP with various experimental improvements from recent papers

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

3D-Transformer: Molecular Representation with Transformer in 3D Space

Comments

Question concerning PyTorch build

_layernorm implementation forward result not equal F.layer_norm

Current state of benchmarking & contributing?

Official layer norm added

Releases(0.1.1)

0.1.1(Apr 5, 2022)

0.1.0(Apr 4, 2022)

0.0.28(Mar 23, 2022)

0.0.27(Nov 6, 2021)

0.0.26(Nov 6, 2021)

0.0.25(Oct 6, 2021)

0.0.24(Oct 4, 2021)

0.0.23(Oct 4, 2021)

0.0.22(Oct 4, 2021)

0.0.21(Oct 4, 2021)

0.0.20(Sep 29, 2021)

0.0.19(Sep 29, 2021)

0.0.18(Sep 29, 2021)

0.0.17(Sep 28, 2021)

0.0.16(Sep 28, 2021)

0.0.15(Sep 27, 2021)

0.0.14(Sep 23, 2021)

0.0.12(Sep 23, 2021)

0.0.10(Sep 23, 2021)

0.0.9(Sep 22, 2021)

0.0.8(Sep 22, 2021)

0.0.7(Sep 22, 2021)

0.0.6(Sep 22, 2021)

0.0.5(Sep 22, 2021)

0.0.4(Sep 22, 2021)

0.0.3(Sep 15, 2021)

0.0.2(Sep 15, 2021)

Owner

Phil Wang

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at [email protected]

TensorRT examples (Jetson, Python/C++)(object detection)

Fast Neural Style for Image Style Transform by Pytorch

A library for optimization on Riemannian manifolds

Pytorch implementation of "A simple neural network module for relational reasoning" (Relational Networks)

Activity tragle - Google is tracking everything, we just look at it

Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation

Multiband spectro-radiometric satellite image analysis with K-means cluster algorithm

Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

A simple Rock-Paper-Scissors game using CV in python

Supervised Contrastive Learning for Product Matching

Space Time Recurrent Memory Network - Pytorch

A spatial genome aligner for analyzing multiplexed DNA-FISH imaging data.

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

PyTorch implementation of UPFlow (unsupervised optical flow learning)

Torch code for our CVPR 2018 paper "Residual Dense Network for Image Super-Resolution" (Spotlight)

Code for the bachelors-thesis flaky fault localization

PyTorch implementation HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

Winners of the Facebook Image Similarity Challenge