A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Last update: Dec 25, 2022

Overview

Basic-UI-for-GPT-J-6B-with-low-vram

A repository to run GPT-J-6B on low vram systems by using both ram, vram and pinned memory.

There seem to be some issues with the weights in the drive link. There seems to be some performance loss, most likely because of poor 16 bit conversion.

How to run :

Use - pip install git+https://github.com/finetuneanon/[email protected]
Use the link - https://drive.google.com/file/d/1tboTvohQifN6f1JiSV8hnciyNKvj9pvm/view?usp=sharing to dowload the model that has been saved as described here - https://github.com/arrmansa/saving-and-loading-large-models-pytorch

Timing (2000 token context)

1

system -

16 gb ddr4 ram . 1070 8gb gpu.
23 blocks on ram (ram_blocks = 23) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

timing -

single run of the model(inputs) takes 6.5 seconds.
35 seconds to generate 25 tokens at 2000 context. (1.4 seconds/token)

2

system -

16 gb ddr4 ram . 1060 6gb gpu.
26 blocks on ram (ram_blocks = 26) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

timing -

40 seconds to generate 25 tokens at 2000 context. (1.6 seconds/token)

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Related tags

Overview

Basic-UI-for-GPT-J-6B-with-low-vram

There seem to be some issues with the weights in the drive link. There seems to be some performance loss, most likely because of poor 16 bit conversion.

How to run :

Timing (2000 token context)

1

system -

timing -

2

system -

timing -

Owner

Pipeline for fast building text classification TF-IDF + LogReg baselines.

Speech Recognition for Uyghur using Speech transformer

Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

中文空间语义理解评测

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Spert NLP Relation Extraction API deployed with torchserve for inference

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Open source code for AlphaFold.

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

A simple implementation of N-gram language model.

Crie tokens de autenticação íntegros e seguros com UToken.

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

내부 작업용 django + vue(vuetify) boilerplate. 짠 하면 돌아감.

A simple chatbot based on chatterbot that you can use for anything has basic features