GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Last update: Dec 26, 2022

Related tags

Overview

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

This paper is the code release of the paper GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition, which is accepted at EACL-2021.

This work aims at improving weakly supervised named entity reconigtion systems by automatically finding new rules that are helpful at identifying entities from data. The idea is, as shown in the following figure, if we know rule1: associated with->Disease is an accurate rule and it is semantically related to rule2: cause of->Disease, we should be able use rule2 as another accurate rule for identifying Disease entities.

The overall workflow is illustrated as below, for a specific type of rules, we frist extract a large set of possible rule candidates from unlabeled data. Then the rule candidates are constructed into a graph where each node represents a candidate and edges are built based on the semantic similarties of the node pairs. Next, by manually identifying a small set of nodes as seeding rules, we use a graph-based neural network to find new rules by propaging the labeling confidence from seeding rules to other candidates. Finally, with the newly learned rules, we follow weak supervision to create weakly labeled dataset by creating a labeling matrix on unlabeled data and training a generative model. Finally, we train our final NER system with a discriminative model.

Installation

Install required libraries

Install LinkedHMM[1] by running pip -r requirements.txt in command line, or from the official repo: https://github.com/BatsResearch/safranchik-aaai20-code.
Install Pytorch at https://pytorch.org/
Install Transformers at https://huggingface.co/transformers/installation.html
Install pytorch-geometric at https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html

Download dataset
- Once LinkedHMM is successfully installed, move all the files in "data" fold under LinkedHMM directory to the "datasets" folder in the currect directory.
- Download pretrained sciBERT embeddings here: https://huggingface.co/allenai/scibert_scivocab_uncased, and move it to the folder pretrained-model.

For saving the time of reading data, we cache all datasets into picked objects: python cache_datasets.py

Run experiments

The experiments on the three data sets are independently conducted. To run experiments for one task, (i.e NCBI), please go to folder code-NCBI. For the experiments on other datasets, namely BC5CDR and LaptopReview, please go to folder code-BC5CDR and code-LaptopReview and run the same commands.

Extract candidate rules for each type and cache embeddings, edges, seeds, etc.

run python prepare_candidates_and_embeddings.py --dataset NCBI --rule_type SurfaceForm to cache candidate rules, embeddings, edges, etc., for SurfaceForm rule.
other rule types are Suffix, Prefix, InclusivePreNgram, ExclusivePreNgram, InclusivePostNgram, ExclusivePostNgram, and Dependency.
all cached data will be save into the folder cached_seeds_and_embeddings.

Train propogation and find new rules.

run python propagate.py --dataset NCBI --rule_type SurfaceForm to learn SurfaceForm rules.
other rules are Suffix, Prefix, InclusivePreNgram, ExclusivePreNgram, InclusivePostNgram, ExclusivePostNgram, and Dependency.

Train LinkedHMM generative model

run python train_generative_model.py --dataset NCBI --use_SurfaceForm --use_Suffix --use_Prefix --use_InclusivePostNgram --use_Dependency.
The argument --use_[TYPE] is used to activate a specific type of rules.

Train discriminative model

run create_dataset_for_bert_tagger.py to prepare dataset for training the tagging model. (make sure to change the dataset and data_name variables in the file first.)
run train_discriminative_model.py

References

[1] Esteban Safranchik, Shiying Luo, Stephen H. Bach. Weakly Supervised Sequence Tagging from Noisy Rules.

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Related tags

Overview

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Installation

Run experiments

References

Owner

Xinyan Zhao

How to Leverage Multimodal EHR Data for Better Medical Predictions?

Pseudo lidar - (CVPR 2019) Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks

以孤立语假设和宽度优先搜索为基础，构建了一种多通道堆叠注意力Transformer结构的斗地主ai

Internship Assessment Task for BaggageAI.

SwinIR: Image Restoration Using Swin Transformer

[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

Code for the paper "Functional Regularization for Reinforcement Learning via Learned Fourier Features"

Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

这是一个利用facenet和retinaface实现人脸识别的库，可以进行在线的人脸识别。

Pytorch port of Google Research's LEAF Audio paper

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

DTCN IJCAI - Sequential prediction learning framework and algorithm

Zalo AI challenge 2021 task hum to song