Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Last update: Sep 11, 2022

Related tags

Text Data & NLP gpt

Overview

Pytorch GPT-X

My Own Pytorch GPT-X

1. Abstract

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

2. Model

Transformer

Additional Module

① Rezero

Rezero Is All You Need link

② Explicit Sparse Transformer

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection link

③ Macaron Architecture

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View link

④ RealFormer, Residual Attention

RealFormer link

Train

DeepSpeed

TODO

~~ReZero~~
RealFormer, Residual Attention
~~Macaron architectures~~
~~Macaron architectures - layer Scale 0.5~~
~~Explicit Sparse Transformer~~
torch lightning
Deepspeed train on single GPU
Deepspeed parallel trainig on 2 V100 GPU with 16GB Memory

Parameter For Few-shot

The 175B parameter model is very large, but a large model is needed for Few-Shot Learning. So this repository try to use DeepSpeed for training extremely big model.

GPT-3 Config

model_name	n_params	n_layer	d_model	n_heads	d_head	batch_size	learning_rate
GPT-3 175B	175B	96	12288	96	128	3.2M	0.6 x 10^-4
GPT-3 13B	13B	40	5140	40	128	2M	1.0 x 10^-4
GPT-3 6.7B	6.7B	32	4096	32	128	2M	1.2 x 10^-4
GPT-3 2.7B	2.7B	32	25560	32	80	1M	1.6 x 10^-4

References

Transformer

lucidrains/x-transformers

DeepSpeed

ReZero

/majumderb/rezero

Explicit Sparse Transformer

x-transformer: explicit_sparse_transformer

Macaron Architecrue

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Related tags

Overview

Pytorch GPT-X

1. Abstract

2. Model

Transformer

Additional Module

① Rezero

② Explicit Sparse Transformer

③ Macaron Architecture

④ RealFormer, Residual Attention

Train

DeepSpeed

TODO

Parameter For Few-shot

GPT-3 Config

References

Owner

Seonghwan Kim

YACLC - Yet Another Chinese Learner Corpus

A PyTorch implementation of VIOLET

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

A fast and easy implementation of Transformer with PyTorch.

This repository contains examples of Task-Informed Meta-Learning

Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

This library is testing the ethics of language models by using natural adversarial texts.

Dust model dichotomous performance analysis

Chinese Grammatical Error Diagnosis

Contact Extraction with Question Answering.

Persian Bert For Long-Range Sequences

This repository has a implementations of data augmentation for NLP for Japanese.

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

Bnagla hand written document digiiztion

Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation.

Code release for "COTR: Correspondence Transformer for Matching Across Images"

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive