GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Related tags

Deep LearningGLaRA
Overview

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

This paper is the code release of the paper GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition, which is accepted at EACL-2021.

This work aims at improving weakly supervised named entity reconigtion systems by automatically finding new rules that are helpful at identifying entities from data. The idea is, as shown in the following figure, if we know rule1: associated with->Disease is an accurate rule and it is semantically related to rule2: cause of->Disease, we should be able use rule2 as another accurate rule for identifying Disease entities.

The overall workflow is illustrated as below, for a specific type of rules, we frist extract a large set of possible rule candidates from unlabeled data. Then the rule candidates are constructed into a graph where each node represents a candidate and edges are built based on the semantic similarties of the node pairs. Next, by manually identifying a small set of nodes as seeding rules, we use a graph-based neural network to find new rules by propaging the labeling confidence from seeding rules to other candidates. Finally, with the newly learned rules, we follow weak supervision to create weakly labeled dataset by creating a labeling matrix on unlabeled data and training a generative model. Finally, we train our final NER system with a discriminative model.

Installation

  1. Install required libraries
  1. Download dataset
    • Once LinkedHMM is successfully installed, move all the files in "data" fold under LinkedHMM directory to the "datasets" folder in the currect directory.
    • Download pretrained sciBERT embeddings here: https://huggingface.co/allenai/scibert_scivocab_uncased, and move it to the folder pretrained-model.
  • For saving the time of reading data, we cache all datasets into picked objects: python cache_datasets.py

Run experiments

The experiments on the three data sets are independently conducted. To run experiments for one task, (i.e NCBI), please go to folder code-NCBI. For the experiments on other datasets, namely BC5CDR and LaptopReview, please go to folder code-BC5CDR and code-LaptopReview and run the same commands.

  1. Extract candidate rules for each type and cache embeddings, edges, seeds, etc.
  • run python prepare_candidates_and_embeddings.py --dataset NCBI --rule_type SurfaceForm to cache candidate rules, embeddings, edges, etc., for SurfaceForm rule.
  • other rule types are Suffix, Prefix, InclusivePreNgram, ExclusivePreNgram, InclusivePostNgram, ExclusivePostNgram, and Dependency.
  • all cached data will be save into the folder cached_seeds_and_embeddings.
  1. Train propogation and find new rules.
  • run python propagate.py --dataset NCBI --rule_type SurfaceForm to learn SurfaceForm rules.
  • other rules are Suffix, Prefix, InclusivePreNgram, ExclusivePreNgram, InclusivePostNgram, ExclusivePostNgram, and Dependency.
  1. Train LinkedHMM generative model
  • run python train_generative_model.py --dataset NCBI --use_SurfaceForm --use_Suffix --use_Prefix --use_InclusivePostNgram --use_Dependency.
  • The argument --use_[TYPE] is used to activate a specific type of rules.
  1. Train discriminative model
  • run create_dataset_for_bert_tagger.py to prepare dataset for training the tagging model. (make sure to change the dataset and data_name variables in the file first.)
  • run train_discriminative_model.py

References

[1] Esteban Safranchik, Shiying Luo, Stephen H. Bach. Weakly Supervised Sequence Tagging from Noisy Rules.

Owner
Xinyan Zhao
I am a Ph.D. Student in School of Information University of Michigan.
Xinyan Zhao
How to Leverage Multimodal EHR Data for Better Medical Predictions?

How to Leverage Multimodal EHR Data for Better Medical Predictions? This repository contains the code of the paper: How to Leverage Multimodal EHR Dat

13 Dec 13, 2022
Pseudo lidar - (CVPR 2019) Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving This paper has been accpeted by Conference o

Yan Wang 881 Dec 27, 2022
[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding Official Pytorch implementation of Negative Sample Matter

Multimedia Computing Group, Nanjing University 69 Dec 26, 2022
Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks

ReFine: Multi-Grained Explainability for GNNs We are trying hard to update the code, but it may take a while to complete due to our tight schedule rec

Shirley (Ying-Xin) Wu 47 Dec 16, 2022
以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的斗地主ai

ddz-ai 介绍 斗地主是一种扑克游戏。游戏最少由3个玩家进行,用一副54张牌(连鬼牌),其中一方为地主,其余两家为另一方,双方对战,先出完牌的一方获胜。 ddz-ai以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的系统,使其经过大量训练后,能在实际游戏中获

freefuiiismyname 88 May 15, 2022
Internship Assessment Task for BaggageAI.

BaggageAI Internship Task Problem Statement: You are given two sets of images:- background and threat objects. Background images are the background x-

Arya Shah 10 Nov 14, 2022
SwinIR: Image Restoration Using Swin Transformer

SwinIR: Image Restoration Using Swin Transformer This repository is the official PyTorch implementation of SwinIR: Image Restoration Using Shifted Win

Jingyun Liang 2.4k Jan 08, 2023
[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Attention Helps CNN See Better: Hybrid Image Quality Assessment Network [CVPRW 2022] Code for Hybrid Image Quality Assessment Network [paper] [code] T

IIGROUP 49 Dec 11, 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers Authors: Jaemin Cho, Abhay Zala, and Mohit Bansal (

Jaemin Cho 98 Dec 15, 2022
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

Introduction QRec is a Python framework for recommender systems (Supported by Python 3.7.4 and Tensorflow 1.14+) in which a number of influential and

Yu 1.4k Dec 30, 2022
A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video

45 Nov 29, 2022
Code for the paper "Functional Regularization for Reinforcement Learning via Learned Fourier Features"

Reinforcement Learning with Learned Fourier Features State-space Soft Actor-Critic Experiments Move to the state-SAC-LFF repository. cd state-SAC-LFF

Alex Li 10 Nov 11, 2022
Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

DRRN-pytorch This is an unofficial implementation of "Deep Recursive Residual Network for Super Resolution (DRRN)", CVPR 2017 in Pytorch. [Paper] You

yun_yang 192 Dec 12, 2022
Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations This directory contains the model architectures and experimental

35 Dec 05, 2022
这是一个利用facenet和retinaface实现人脸识别的库,可以进行在线的人脸识别。

Facenet+Retinaface:人脸识别模型在Pytorch当中的实现 目录 注意事项 Attention 所需环境 Environment 文件下载 Download 预测步骤 How2predict 参考资料 Reference 注意事项 该库中包含了两个网络,分别是retinaface和

Bubbliiiing 102 Dec 30, 2022
Pytorch port of Google Research's LEAF Audio paper

leaf-audio-pytorch Pytorch port of Google Research's LEAF Audio paper published at ICLR 2021. This port is not completely finished, but the Leaf() fro

Dennis Fedorishin 80 Oct 31, 2022
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

AlphaZero-Gomoku This is an implementation of the AlphaZero algorithm for playing the simple board game Gomoku (also called Gobang or Five in a Row) f

Junxiao Song 2.8k Dec 26, 2022
Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

Pre-trained image classification models for Jax/Haiku Jax/Haiku Applications are deep learning models that are made available alongside pre-trained we

Alper Baris CELIK 14 Dec 20, 2022
DTCN IJCAI - Sequential prediction learning framework and algorithm

DTCN This is the implementation of our paper "Sequential Prediction of Social Me

Bobby 2 Jan 24, 2022
Zalo AI challenge 2021 task hum to song

Zalo AI challenge 2021 task Hum to Song pipeline: Chuẩn bị dữ liệu cho quá trình train: Sửa các file đường dẫn trong config/preprocess.yaml raw_path:

Vo Van Phuc 105 Dec 16, 2022