Mesh Transformer Jax

A haiku library using the new(ly documented) xmap operator in Jax for model parallelism of transformers.

See enwik8_example.py for an example of using this to implement an autoregressive language model.

Benchmarks

On a TPU v3-8 (see tpuv38_example.py):

~2.7B model

Initialized in 121.842s
Total parameters: 2722382080
Compiled in 49.0534s
it: 0, loss: 20.311113357543945
<snip>
it: 90, loss: 3.987450361251831
100 steps in 109.385s
effective flops (not including attn): 2.4466e+14

~4.8B model

Initialized in 101.016s
Total parameters: 4836720896
Compiled in 52.7404s
it: 0, loss: 4.632925987243652
<snip>
it: 40, loss: 3.2406811714172363
50 steps in 102.559s
effective flops (not including attn): 2.31803e+14

10B model

Initialized in 152.762s
Total parameters: 10073579776
Compiled in 92.6539s
it: 0, loss: 5.3125
<snip>
it: 40, loss: 3.65625
50 steps in 100.235s
effective flops (not including attn): 2.46988e+14

Model parallel transformers in Jax and Haiku

Related tags

Overview

Mesh Transformer Jax

Benchmarks

~2.7B model

~4.8B model

10B model

TODO

Owner

Ben Wang

Code for AutoNL on ImageNet (CVPR2020)

Pytorch implementation of "Neural Wireframe Renderer: Learning Wireframe to Image Translations"

Implementation of Google Brain's WaveGrad high-fidelity vocoder

:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Dark Finix: All in one hacking framework with almost 100 tools

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

Neighborhood Contrastive Learning for Novel Class Discovery

Implementation of gaze tracking and demo

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

[ICCV 2021 Oral] Deep Evidential Action Recognition

Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

Telegram chatbot created with deep learning model (LSTM) and telebot library.

load .txt to train YOLOX, same as Yolo others

GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️

A trusty face recognition research platform developed by Tencent Youtu Lab

An air quality monitoring service with a Raspberry Pi and a SDS011 sensor.

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).