The official github repository for Towards Continual Knowledge Learning of Language Models

Last update: Jan 07, 2023

Overview

Towards Continual Knowledge Learning of Language Models

This is the official github repository for Towards Continual Knowledge Learning of Language Models.

In order to reproduce our results, take the following steps:

1. Create conda environment and install requirements

conda create -n ckl python=3.8 && conda activate ckl
pip install -r requirements.txt

Also, make sure to install the correct version of pytorch corresponding to the CUDA version and environment: Refer to https://pytorch.org/

#For CUDA 10.x
pip3 install torch torchvision torchaudio
#For CUDA 11.x
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

2. Download the data used for the experiments.

To download only the CKL benchmark dataset:

python download_ckl_data.py

To download ALL of the data used for the experiments (required to reproduce results):

python download_all_data.py

To download the (continually pretrained) model checkpoints of the main experiment (required to reproduce results):

python download_model_checkpoints.py

For the other experimental settings such as multiple CKL phases, GPT-2, we do not separately provide the continually pretrained model checkpoints.

3. Reproducing Experimental Results

We provide all the configs in order to reproduce the zero-shot results of our paper. We only provide the model checkpoints for the main experimental setting (full_setting) which can be downloaded with the command above.

configs
├── full_setting
│   ├── evaluation
│   |   ├── invariantLAMA
│   |   |   ├── t5_baseline.json
│   |   |   ├── t5_kadapters.json
│   |   |   ├── ...
│   |   ├── newLAMA
│   |   ├── newLAMA_easy
│   |   ├── updatedLAMA
│   ├── training
│   |   ├── t5_baseline.json
│   |   ├── t5_kadapters.json
│   |   ├── ...
├── GPT2
│   ├── ...
├── kilt
│   ├── ...
├── small_setting
│   ├── ...
├── split
│   ├── ...

Components in each configurations file

input_length (int) : the input sequence length
output_length (int) : the output sequence length
num_train_epochs (int) : number of training epochs
output_dir (string) : the directory to save the model checkpoints
dataset (string) : the dataset to perform zero-shot evaluation or continual pretraining
dataset_version (string) : the version of the dataset ['full', 'small', 'debug']
train_batch_size (int) : batch size used for training
learning rate (float) : learning rate used for training
model (string) : model name in huggingface models (https://huggingface.co/models)
method (string) : method being used ['baseline', 'kadapter', 'lora', 'mixreview', 'modular_small', 'recadam']
freeze_level (int) : how much of the model to freeze during traininig (0 for none, 1 for freezing only encoder, 2 for freezing all of the parameters)
gradient_accumulation_steps (int) : gradient accumulation used to match the global training batch of each method
ngpu (int) : number of gpus used for the run
num_workers (int) : number of workers for the Dataloader
resume_from_checkpoint (string) : null by default. directory to model checkpoint if resuming from checkpoint
accelerator (string) : 'ddp' by default. the pytorch lightning accelerator to be used.
use_deepspeed (bool) : false by default. Currently not extensively tested.
CUDA_VISIBLE_DEVICES (string) : gpu devices that are made available for this run (e.g. "0,1,2,3", "0")
wandb_log (bool) : whether to log experiment through wandb
wandb_project (string) : project name of wandb
wandb_run_name (string) : the name of this training run
mode (string) : 'pretrain' for all configs
use_lr_scheduling (bool) : true if using learning rate scheduling
check_validation (bool) : true for evaluation (no training)
checkpoint_path (string) : path to the model checkpoint that is used for evaluation
output_log (string) : directory to log evaluation results to
split_num (int) : default is 1. more than 1 if there are multile CKL phases
split (int) : which CKL phase it is

This is an example of getting the invariantLAMA zero-shot evaluation of continually pretrained t5_kadapters

python run.py --config configs/full_setting/evaluation/invariantLAMA/t5_kadapters.json

This is an example of performing continual pretraining on CC-RecentNews (main experiment) with t5_kadapters

python run.py --config configs/full_setting/training/t5_kadapters.json

Reference

@article{jang2021towards,
  title={Towards Continual Knowledge Learning of Language Models},
  author={Jang, Joel and Ye, Seonghyeon and Yang, Sohee and Shin, Joongbo and Han, Janghoon and Kim, Gyeonghun and Choi, Stanley Jungkyu and Seo, Minjoon},
  journal={arXiv preprint arXiv:2110.03215},
  year={2021}
}

The official github repository for Towards Continual Knowledge Learning of Language Models

Related tags

Overview

Towards Continual Knowledge Learning of Language Models

1. Create conda environment and install requirements

2. Download the data used for the experiments.

3. Reproducing Experimental Results

Components in each configurations file

Reference

Owner

Joel Jang | 장요엘

This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Label-Free Model Evaluation with Semi-Structured Dataset Representations

A Pytorch implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Tello Drone Trajectory Tracking

An Open Source Machine Learning Framework for Everyone

Voxel Transformer for 3D object detection

hipCaffe: the HIP port of Caffe

[ArXiv 2021] One-Shot Generative Domain Adaptation

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

A Jinja extension (compatible with Flask and other frameworks) to compile and/or compress your assets.

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Python-kafka-reset-consumergroup-offset-example - Python Kafka reset consumergroup offset example

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Implementation detail for paper "Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet"

Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

This repository provides code for "On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness".

Event-forecasting - Event Forecasting Algorithms With Python

Multi agent DDPG algorithm written in Python + Pytorch

Official Pytorch Implementation of GraphiT