LUKE -- Language Understanding with Knowledge-based Embeddings

Related tags

Deep Learningluke
Overview

LUKE

CircleCI


LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transformer. It was proposed in our paper LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. It achieves state-of-the-art results on important NLP benchmarks including SQuAD v1.1 (extractive question answering), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), TACRED (relation classification), and Open Entity (entity typing).

This repository contains the source code to pre-train the model and fine-tune it to solve downstream tasks.

News

November 24, 2021: Entity disambiguation example is available

The example code of entity disambiguation based on LUKE has been added to this repository. This model was originally proposed in our paper, and achieved state-of-the-art results on five standard entity disambiguation datasets: AIDA-CoNLL, MSNBC, AQUAINT, ACE2004, and WNED-WIKI.

For further details, please refer to the example directory.

August 3, 2021: New example code based on Hugging Face Transformers and AllenNLP is available

New fine-tuning examples of three downstream tasks, i.e., NER, relation classification, and entity typing, have been added to LUKE. These examples are developed based on Hugging Face Transformers and AllenNLP. The fine-tuning models are defined using simple AllenNLP's Jsonnet config files!

The example code is available in the examples_allennlp directory.

May 5, 2021: LUKE is added to Hugging Face Transformers

LUKE has been added to the master branch of the Hugging Face Transformers library. You can now solve entity-related tasks (e.g., named entity recognition, relation classification, entity typing) easily using this library.

For example, the LUKE-large model fine-tuned on the TACRED dataset can be used as follows:

>>> from transformers import LukeTokenizer, LukeForEntityPairClassification
>>> model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
>>> tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
>>> text = "Beyoncé lives in Los Angeles."
>>> entity_spans = [(0, 7), (17, 28)]  # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
>>> inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> predicted_class_idx = int(logits[0].argmax())
>>> print("Predicted class:", model.config.id2label[predicted_class_idx])
Predicted class: per:cities_of_residence

We also provide the following three Colab notebooks that show how to reproduce our experimental results on CoNLL-2003, TACRED, and Open Entity datasets using the library:

Please refer to the official documentation for further details.

November 5, 2021: LUKE-500K (base) model

We released LUKE-500K (base), a new pretrained LUKE model which is smaller than existing LUKE-500K (large). The experimental results of the LUKE-500K (base) and LUKE-500K (large) on SQuAD v1 and CoNLL-2003 are shown as follows:

Task Dataset Metric LUKE-500K (base) LUKE-500K (large)
Extractive Question Answering SQuAD v1.1 EM/F1 86.1/92.3 90.2/95.4
Named Entity Recognition CoNLL-2003 F1 93.3 94.3

We tuned only the batch size and learning rate in the experiments based on LUKE-500K (base).

Comparison with State-of-the-Art

LUKE outperforms the previous state-of-the-art methods on five important NLP tasks:

Task Dataset Metric LUKE-500K (large) Previous SOTA
Extractive Question Answering SQuAD v1.1 EM/F1 90.2/95.4 89.9/95.1 (Yang et al., 2019)
Named Entity Recognition CoNLL-2003 F1 94.3 93.5 (Baevski et al., 2019)
Cloze-style Question Answering ReCoRD EM/F1 90.6/91.2 83.1/83.7 (Li et al., 2019)
Relation Classification TACRED F1 72.7 72.0 (Wang et al. , 2020)
Fine-grained Entity Typing Open Entity F1 78.2 77.6 (Wang et al. , 2020)

These numbers are reported in our EMNLP 2020 paper.

Installation

LUKE can be installed using Poetry:

$ poetry install

The virtual environment automatically created by Poetry can be activated by poetry shell.

Released Models

We initially release the pre-trained model with 500K entity vocabulary based on the roberta.large model.

Name Base Model Entity Vocab Size Params Download
LUKE-500K (base) roberta.base 500K 253 M Link
LUKE-500K (large) roberta.large 500K 483 M Link

Reproducing Experimental Results

The experiments were conducted using Python3.6 and PyTorch 1.2.0 installed on a server with a single or eight NVidia V100 GPUs. We used NVidia's PyTorch Docker container 19.02. For computational efficiency, we used mixed precision training based on APEX library which can be installed as follows:

$ git clone https://github.com/NVIDIA/apex.git
$ cd apex
$ git checkout c3fad1ad120b23055f6630da0b029c8b626db78f
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

The APEX library is not needed if you do not use --fp16 option or reproduce the results based on the trained checkpoint files.

The commands that reproduce the experimental results are provided as follows:

Entity Typing on Open Entity Dataset

Dataset: Link
Checkpoint file (compressed): Link

Using the checkpoint file:

$ python -m examples.cli \
    --model-file=luke_large_500k.tar.gz \
    --output-dir=<OUTPUT_DIR> \
    entity-typing run \
    --data-dir=<DATA_DIR> \
    --checkpoint-file=<CHECKPOINT_FILE> \
    --no-train

Fine-tuning the model:

$ python -m examples.cli \
    --model-file=luke_large_500k.tar.gz \
    --output-dir=<OUTPUT_DIR> \
    entity-typing run \
    --data-dir=<DATA_DIR> \
    --train-batch-size=2 \
    --gradient-accumulation-steps=2 \
    --learning-rate=1e-5 \
    --num-train-epochs=3 \
    --fp16

Relation Classification on TACRED Dataset

Dataset: Link
Checkpoint file (compressed): Link

Using the checkpoint file:

$ python -m examples.cli \
    --model-file=luke_large_500k.tar.gz \
    --output-dir=<OUTPUT_DIR> \
    relation-classification run \
    --data-dir=<DATA_DIR> \
    --checkpoint-file=<CHECKPOINT_FILE> \
    --no-train

Fine-tuning the model:

$ python -m examples.cli \
    --model-file=luke_large_500k.tar.gz \
    --output-dir=<OUTPUT_DIR> \
    relation-classification run \
    --data-dir=<DATA_DIR> \
    --train-batch-size=4 \
    --gradient-accumulation-steps=8 \
    --learning-rate=1e-5 \
    --num-train-epochs=5 \
    --fp16

Named Entity Recognition on CoNLL-2003 Dataset

Dataset: Link
Checkpoint file (compressed): Link

Using the checkpoint file:

$ python -m examples.cli \
    --model-file=luke_large_500k.tar.gz \
    --output-dir=<OUTPUT_DIR> \
    ner run \
    --data-dir=<DATA_DIR> \
    --checkpoint-file=<CHECKPOINT_FILE> \
    --no-train

Fine-tuning the model:

$ python -m examples.cli\
    --model-file=luke_large_500k.tar.gz \
    --output-dir=<OUTPUT_DIR> \
    ner run \
    --data-dir=<DATA_DIR> \
    --train-batch-size=2 \
    --gradient-accumulation-steps=4 \
    --learning-rate=1e-5 \
    --num-train-epochs=5 \
    --fp16

Cloze-style Question Answering on ReCoRD Dataset

Dataset: Link
Checkpoint file (compressed): Link

Using the checkpoint file:

$ python -m examples.cli \
    --model-file=luke_large_500k.tar.gz \
    --output-dir=<OUTPUT_DIR> \
    entity-span-qa run \
    --data-dir=<DATA_DIR> \
    --checkpoint-file=<CHECKPOINT_FILE> \
    --no-train

Fine-tuning the model:

$ python -m examples.cli \
    --num-gpus=8 \
    --model-file=luke_large_500k.tar.gz \
    --output-dir=<OUTPUT_DIR> \
    entity-span-qa run \
    --data-dir=<DATA_DIR> \
    --train-batch-size=1 \
    --gradient-accumulation-steps=4 \
    --learning-rate=1e-5 \
    --num-train-epochs=2 \
    --fp16

Extractive Question Answering on SQuAD 1.1 Dataset

Dataset: Link
Checkpoint file (compressed): Link
Wikipedia data files (compressed): Link

Using the checkpoint file:

$ python -m examples.cli \
    --model-file=luke_large_500k.tar.gz \
    --output-dir=<OUTPUT_DIR> \
    reading-comprehension run \
    --data-dir=<DATA_DIR> \
    --checkpoint-file=<CHECKPOINT_FILE> \
    --no-negative \
    --wiki-link-db-file=enwiki_20160305.pkl \
    --model-redirects-file=enwiki_20181220_redirects.pkl \
    --link-redirects-file=enwiki_20160305_redirects.pkl \
    --no-train

Fine-tuning the model:

$ python -m examples.cli \
    --num-gpus=8 \
    --model-file=luke_large_500k.tar.gz \
    --output-dir=<OUTPUT_DIR> \
    reading-comprehension run \
    --data-dir=<DATA_DIR> \
    --no-negative \
    --wiki-link-db-file=enwiki_20160305.pkl \
    --model-redirects-file=enwiki_20181220_redirects.pkl \
    --link-redirects-file=enwiki_20160305_redirects.pkl \
    --train-batch-size=2 \
    --gradient-accumulation-steps=3 \
    --learning-rate=15e-6 \
    --num-train-epochs=2 \
    --fp16

Citation

If you use LUKE in your work, please cite the original paper:

@inproceedings{yamada2020luke,
  title={LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention},
  author={Ikuya Yamada and Akari Asai and Hiroyuki Shindo and Hideaki Takeda and Yuji Matsumoto},
  booktitle={EMNLP},
  year={2020}
}

Contact Info

Please submit a GitHub issue or send an e-mail to Ikuya Yamada ([email protected]) for help or issues using LUKE.

Owner
Studio Ousia
Studio Ousia
It's a implement of this paper:Relation extraction via Multi-Level attention CNNs

Relation Classification via Multi-Level Attention CNNs It's a implement of this paper:Relation Classification via Multi-Level Attention CNNs. Training

Aybss 2 Nov 04, 2022
Post-training Quantization for Neural Networks with Provable Guarantees

Post-training Quantization for Neural Networks with Provable Guarantees Authors: Jinjie Zhang ( Yixuan Zhou 2 Nov 29, 2022

PyTorch implementation of "Optimization Planning for 3D ConvNets"

Optimization-Planning-for-3D-ConvNets Code for the ICML 2021 paper: Optimization Planning for 3D ConvNets. Authors: Zhaofan Qiu, Ting Yao, Chong-Wah N

Zhaofan Qiu 2 Jan 12, 2022
CARL provides highly configurable contextual extensions to several well-known RL environments.

CARL (context adaptive RL) provides highly configurable contextual extensions to several well-known RL environments.

AutoML-Freiburg-Hannover 51 Dec 28, 2022
Explore extreme compression for pre-trained language models

Code for paper "Exploring extreme parameter compression for pre-trained language models ICLR2022"

twinkle 16 Nov 14, 2022
Code release for ICCV 2021 paper "Anticipative Video Transformer"

Anticipative Video Transformer Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT) [project page

Facebook Research 123 Dec 13, 2022
The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021)

The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021) Arash Vahdat*   ·   Karsten Kreis*   ·  

NVIDIA Research Projects 238 Jan 02, 2023
Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks

OnsagerNet Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks This is the original pyTorch implemenati

Haijun.Yu 3 Aug 24, 2022
Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

SPLASH: Semantic Parsing with Language Assistance from Humans SPLASH is dataset for the task of semantic parse correction with natural language feedba

Microsoft Research - Language and Information Technologies (MSR LIT) 35 Oct 31, 2022
Using LSTM write Tang poetry

本教程将通过一个示例对LSTM进行介绍。通过搭建训练LSTM网络,我们将训练一个模型来生成唐诗。本文将对该实现进行详尽的解释,并阐明此模型的工作方式和原因。并不需要过多专业知识,但是可能需要新手花一些时间来理解的模型训练的实际情况。为了节省时间,请尽量选择GPU进行训练。

56 Dec 15, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Keyword Spotting Transformer This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train o

Intelligent Machines Limited 8 May 11, 2022
This repo provides a demo for the CVPR 2021 paper "A Fourier-based Framework for Domain Generalization" on the PACS dataset.

FACT This repo provides a demo for the CVPR 2021 paper "A Fourier-based Framework for Domain Generalization" on the PACS dataset. To cite, please use:

105 Dec 17, 2022
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Lutz Roeder 21k Jan 06, 2023
VOGUE: Try-On by StyleGAN Interpolation Optimization

VOGUE is a StyleGAN interpolation optimization algorithm for photo-realistic try-on. Top: shirt try-on automatically synthesized by our method in two different examples.

Wei ZHANG 66 Dec 09, 2022
A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis Project Page | Paper A Shading-Guided Generative Implicit Model

Xingang Pan 115 Dec 18, 2022
A python library for self-supervised learning on images.

Lightly is a computer vision framework for self-supervised learning. We, at Lightly, are passionate engineers who want to make deep learning more effi

Lightly 2k Jan 08, 2023
[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks Code for NeurIPS 2021 Paper "Exploring Architectural Ingredients of A

Hanxun Huang 26 Dec 01, 2022
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

90 Dec 29, 2022
Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

Visual Inference Lab @TU Darmstadt 85 Dec 22, 2022