In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Last update: Apr 13, 2022

Overview

Transformers are all you need

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Table of Content

The workshop will be divided into four parts

Introduction to Transformers as a HYPE
Sneak peek to the theory behind Transfomers
Quick tour (Huggingface framework)
Lab
- fine tune a translation model

Note that you can always open the notebooks on Google Colab ( No need to install anything ) you just need a stable internet connection :

- fine tune a translation model

2. How to get started

Fork this repository
Create a branch by your name
Go through the notebook and complete all tasks
Submit a pull request

Homework exercise

Your task is to fine-tune a classification model

Using HuggingFace transformers and datasets.
fine tune it to one of the classification task of the GLUE Benchmark(CoLa to be specific).
Use a checkpoint from the Hub ("distilbert-base-uncased" for example)
Once finished submit a pull request to this repo, make sure to place your .ipynb file in the submissions folder (YOUR_NAME.ipynb)

Useful ressources : text_classification

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Related tags

Overview

Transformers are all you need

Table of Content

Note that you can always open the notebooks on Google Colab ( No need to install anything ) you just need a stable internet connection :

2. How to get started

Homework exercise

Owner

Aymen Berriche

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

iBOT: Image BERT Pre-Training with Online Tokenizer

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Python package for Turkish Language.

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

CoSENT、STS、SentenceBERT

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

Simple NLP based project without any use of AI

What are the best Systems? New Perspectives on NLP Benchmarking

Final Project Bootcamp Zero

Chinese Pre-Trained Language Models (CPM-LM) Version-I

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

ChatBotProyect - This is an unfinished project about a simple chatbot.

CCF BDCI 2020 房产行业聊天问答匹配赛道 A榜47/2985

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields