Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Last update: Oct 18, 2022

Related tags

Overview

The KLEJ Benchmark Baselines

The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language understanding.

This repository contains example scripts to easily fine-tune models from the transformers library on the KLEJ benchmark.

Installation

Install the Python package using the following commands:

$ git clone https://github.com/allegro/klejbenchmark-baselines
$ pip install klejbenchmark-baselines/

Quick Start

To fine-tune your model on KLEJ tasks using the default settings, you can use the provided example scripts.

First, download the KLEJ benchmark datasets:

$ bash scripts/download_klej.sh

After downloading KLEJ, customize training parameters inside the scripts/run_training.sh script and train the models using:

$ bash scripts/run_training.sh

It will create:

Tensorboard logs with training and validation metrics,
checkpoints of the best models,
a zip file with predictions for the test sets, which is a valid submission for the KLEJ benchmark.

The zip file can be submitted at the klejbenchmark.com website for the evaluation on the test sets.

Custom Training

It's also possible to train each model separately and customize the training parameters using the klejbenchmark_baselines/main.py script.

License

Apache 2 License

Citation

If you use this code, please cite the following paper:

@inproceedings{rybak-etal-2020-klej,
    title = "{KLEJ}: Comprehensive Benchmark for Polish Language Understanding",
    author = "Rybak, Piotr and Mroczkowski, Robert and Tracz, Janusz and Gawlik, Ireneusz",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.111",
    pages = "1191--1201",
}

Authors

This code was created by the Allegro Machine Learning Research team.

You can contact us at: [email protected]

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Related tags

Overview

The KLEJ Benchmark Baselines

Installation

Quick Start

Custom Training

License

Citation

Authors

Owner

Allegro Tech

An Open-Source Package for Neural Relation Extraction (NRE)

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

构建一个多源（公众号、RSS）、干净、个性化的阅读环境

뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

A website which allows you to play with the GPT-2 transformer

Understanding the Difficulty of Training Transformers

Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

Code repository for "It's About Time: Analog clock Reading in the Wild"

Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

本插件是pcrjjc插件的重置版，可以独立于后端api运行

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Watson Natural Language Understanding and Knowledge Studio

A simple implementation of N-gram language model.

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.