DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Last update: Dec 27, 2021

Related tags

Deep Learning machine-learning

Overview

DSEE

Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Chen, Zhangyang Wang, Ahmed Hassan Awadallahp

Overview

TBD

Requirements

We use conda to create virtual environments.

conda create -f environment.yml
conda activate dsee

Command

Unstructured DSEE

Step 0.

cd non-GPT-2
pip install -e .
cd ..

Step 1. Pre-training

Take SST-2 as example:

OUTPUT_DIR='./sst2_rank16_s1_64'
num_gpus=4
python -m torch.distributed.launch \
    --nproc_per_node=$num_gpus \
    --master_port=12345 non-GPT-2/examples/pytorch/text-classification/run_glue.py \
    --save_total_limit 10 \
    --model_name_or_path bert-base-uncased \ 
    --task_name sst2 \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --num_train_epochs 3 \
    --save_steps 50 \
    --seed 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_seq_length 128 \
    --overwrite_output_dir \
    --logging_steps 50 \
    --load_best_model_at_end True \
    --metric_for_best_model eval_accuracy \
    --apply_lora \
    --lora_r 16 \
    --apply_sparse \
    --num_sparse 64  \
    --learning_rate 2e-4 \
    --evaluation_strategy steps

Step 2. Pruning & Fine-tuning

OUTPUT_DIR='./sst2_rank16_s1_64_prune_0.5'
num_gpus=4
python -m torch.distributed.launch \
    --nproc_per_node=$num_gpus \
    --master_port=12335 \
    non-GPT-2/examples/pytorch/text-classification/run_glue_prune_tune.py \
    --save_total_limit 10 \
    --model_name_or_path sst2_rank16_s1_64 \
    --task_name sst2 \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --num_train_epochs 3 \
    --save_steps 50 \
    --seed 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_seq_length 128 \
    --overwrite_output_dir \
    --logging_steps 50 \
    --load_best_model_at_end True \
    --metric_for_best_model eval_accuracy \
    --apply_lora \
    --lora_r 16 \
    --apply_sparse \
    --num_sparse 64 \
    --learning_rate 2e-4 \
    --pruning_ratio 0.5 \
    --evaluation_strategy steps

TODO

Codes for Unstructured DSEE on GPT-2
Codes for Structured DSEE

Acknowledgement

The Huggingface's Transformers (https://github.com/huggingface/transformers)

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Related tags

Overview

DSEE

Overview

Requirements

Command

Unstructured DSEE

Step 0.

Step 1. Pre-training

Step 2. Pruning & Fine-tuning

TODO

Acknowledgement

Owner

VITA

Continual reinforcement learning baselines: experiment specifications, implementation of existing methods, and common metrics. Easily extensible to new methods.

torchbearer: A model fitting library for PyTorch

A PyTorch implementation of a Factorization Machine module in cython.

Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe

OOD Generalization and Detection (ACL 2020)

3D Generative Adversarial Network

NeurIPS 2021 Datasets and Benchmarks Track

This repository contains the official MATLAB implementation of the TDA method for reverse image filtering

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

This repo contains implementation of different architectures for emotion recognition in conversations.

PyTorch implementation of SQN based on CloserLook3D's encoder

基于PaddleClas实现垃圾分类，并转换为inference格式用PaddleHub服务端部署

Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

PantheonRL is a package for training and testing multi-agent reinforcement learning environments.

Code for the paper "Location-aware Single Image Reflection Removal"

COVID-Net Open Source Initiative

Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes

CondenseNet: Light weighted CNN for mobile devices

A standard framework for modelling Deep Learning Models for tabular data

This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"