LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

Last update: Aug 24, 2022

Related tags

Overview

LV-BERT

Introduction

In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, please refer to our paper LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021).

Requirements

Python 3.6
TensorFlow 1.15
numpy
scikit-learn

Experiments

Firstly, set your data dir (absolute) to place datasets and models by

DATA_DIR=/path/to/data/dir

Fine-tining

We give the instruction to fine-tune a pre-trained LV-BERT-small (13M parameters) on GLUE. You can refer to this Google Colab notebook for a quick example. All models of different are provided this Google Drive folder. The models are pre-trained 1M steps with sequence length 128 to save compute. *_seq512 named models are trained for more 100K steps with sequence length 512 whichs are used for long-sequence tasks like SQuAD. See our paper for more details on model performance.

Create your data directory.

mkdir -p $DATA_DIR/models && cp vocab.txt $DATA_DIR/

Put the pre-trained model in the corresponding directory

mv lv-bert_small $DATA_DIR/models/

Download the GLUE data by running

python3 download_glue_data.py

Set up the data by running

cd glue_data && mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic/diagnostic.tsv mnli && mkdir -p $DATA_DIR/finetuning_data && mv * $DATA_DIR/finetuning_data && cd ..

Fine-tune the model by running

bash finetune.sh $DATA_DIR

PS: (a) You can test different tasks by changing configs in finetune.sh. (b) Some of the datasets on GLUE are small, causing that the results may vary substantially for different random seeds. The same as ELECTRA, we report the median of 10 fine-tuning runs from the same pre-trained model for each result.

Pre-training

We give the instruction to pre-train LV-BERT-small (13M parameters) using the OpenWebText corpus.

First download the OpenWebText pre-traing corpus (12G).
After downloading the pre-training corpus, build the pre-training dataset tf-record by running

bash build_data.sh $DATA_DIR

Then, pre-train the model by running

bash pretrain.sh $DATA_DIR

Bibtex

@inproceedings{yu2021lv-bert,
        author = {Yu, Weihao and Jiang, Zihang and Chen, Fei, Hou, Qibin and Feng, Jiashi},
        title = {LV-BERT: Exploiting Layer Variety for BERT},
        booktitle = {Findings of ACL},
        month = {August},
        year = {2021}
}

Reference

This repo is based on the repo ELECTRA.

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

Related tags

Overview

LV-BERT

Introduction

Requirements

Experiments

Fine-tining

Pre-training

Bibtex

Reference

Owner

Weihao Yu

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Knowledge Management for Humans using Machine Learning & Tags

PyTorch source code of NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models"

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

Residual2Vec: Debiasing graph embedding using random graphs

🎐 a python library for doing approximate and phonetic matching of strings.

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Reformer, the efficient Transformer, in Pytorch

Script and models for clustering LAION-400m CLIP embeddings.

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

Python api wrapper for JellyFish Lights

Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

基于“Seq2Seq+前缀树”的知识图谱问答

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Bnagla hand written document digiiztion