This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Last update: Dec 24, 2022

Related tags

Overview

MoEBERT

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Installation

Create and activate conda environment.

conda env create -f environment.yml

Install Transformers locally.

pip install -e .

Note: The code is adapted from this codebase. Arguments regarding LoRA and adapter can be safely ignored.

Instructions

MoEBERT targets task-specific distillation. Before running any distillation code, a pre-trained BERT model should be fine-tuned on the target task. Path to the fine-tuned model should be passed to --model_name_or_path.

Importance Score Computation

Use bert_base_mnli_example.sh to compute the importance scores, add a --preprocess_importance argument, remove the --do_train argument.
If multiple GPUs are used to compute the importance scores, a importance_[rank].pkl file will be saved for each GPU. Use merge_importance.py to merge these files.
To use the pre-computed importance scores, pass the file name to --moebert_load_importance.

Knowledge Distillation

For GLUE tasks, see examples/text-classification/run_glue.py.
For question answering tasks, see examples/question-answering/run_qa.py.
Run bash bert_base_mnli_example.sh as an example.
The codebase supports different routing strategies: gate-token, gate-sentence, hash-random and hash-balance. Choices should be passed to --moebert_route_method.
- To use hash-balance, a balanced hash list needs to be pre-computed using hash_balance.py. Path to the saved hash list should be passed to --moebert_route_hash_list.
- Add a load balancing loss by setting --moebert_load_balance when using trainable gating mechanisms.
- The sentence-based gating mechanism (gate-sentence) is advantageous for inference because it induces significantly less communication overhead compared with token-level routing methods.

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Related tags

Overview

MoEBERT

Installation

Instructions

Importance Score Computation

Knowledge Distillation

Owner

Simiao Zuo

Collection of sports betting AI tools.

Code for the paper "On the Power of Edge Independent Graph Models"

Simulation-based performance analysis of server-less Blockchain-enabled Federated Learning

Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

RoIAlign & crop_and_resize for PyTorch

FluxTraining.jl gives you an endlessly extensible training loop for deep learning

[ICCV'21] PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

[IEEE TPAMI21] MobileSal: Extremely Efficient RGB-D Salient Object Detection [PyTorch & Jittor]

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

A template repository for submitting a job to the Slurm Cluster installed at the DISI - University of Bologna

A series of Python scripts to access measurements from Fluke 28X meters. Fluke IR Remote Interface required.

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

Learning multiple gaits of quadruped robot using hierarchical reinforcement learning

Gym Threat Defense

dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)

For medical image segmentation

VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

You can draw the corresponding bounding box into the image and save it according to the result file (txt format) run by the tracker.

Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

A Tensorflow based library for Time Series Modelling with Gaussian Processes