CoNLL-English NER Task

en | ch

Motivation

Course Project
review the pytorch framework and sequence-labeling task
practice using the transformers of Huggingface

Dataset Introduction

A train set, a test set and a validation set in the data file

-DOCSTART- -X- O O
-sentnce- -pos- -Chuck- -Entity-

Project Structure

-data  # source data
-emb # BERT model files

-util
    -dataTool.py  # data interface
    -model.py
    -trainer.py  # train and evaluate

config.py  # parameters in the project
run.py
requirement.txt

EDA.ipynb # exploratory data analasis, 
          # which aims to confirm the hyper-params in the trials

Coding Pattern

For keeping the convenience and simplicity of experiments,
decouple the model into two units: encoder and tagger

model ==> encoder + tagger

In such a way, encoder extracts the context and linguistit features,
which will be received by tagger to output BIO tags.

Usage

chmod 755 deploy
./deploy

./gpu n  # monitor the GPU (refresh every n seconds)
./run  # start

Baseline Performance (1 ep | macro)

Model	Precision	Recall	F1
Bert-CRF	0.71	0.68	0.69
Bert-softmax	-	-	-
Bert-BiLSTM-CRF	-	-	-
Bert-BiLSTM-softmax	-	-	-

Optimization

cost sensitive learning or drop the few classes
dropout to improve the generalization performance
different backbone structures
DDP training --> large GPU caches for a large batch_size
more epochs --> schedule the learning rate dynamically while training

CoNLL-English NER Task (NER in English)

Related tags

Overview

CoNLL-English NER Task

Motivation

Dataset Introduction

Project Structure

Coding Pattern

Usage

Baseline Performance (1 ep | macro)

Optimization

Owner

Kevin

📔️ Generate a text-based journal from a template file.

A simple implementation of N-gram language model.

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

A unified tokenization tool for Images, Chinese and English.

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

A telegram bot to translate 100+ Languages

Artificial Conversational Entity for queries in Eulogio "Amang" Rodriguez Institute of Science and Technology (EARIST)

Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

Natural Language Processing

Generating Korean Slogans with phonetic and structural repetition

SimBERT升级版（SimBERTv2）！

Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

NLP Text Classification

Code release for "COTR: Correspondence Transformer for Matching Across Images"

This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

Auto translate textbox from Japanese to English or Indonesia

It analyze the sentiment of the user, whether it is postive or negative.