Re-TACRED: Addressing Shortcomings of the TACRED Dataset

Last update: Dec 10, 2022

Related tags

Overview

Re-TACRED

Re-TACRED: Addressing Shortcomings of the TACRED Dataset
George Stoica, Emmanouil Antonios Platanios, and Barnabás Póczos
In Proceedings of the Thirty-fifth AAAI Conference on Artificial Intelligence 2021

Primary Contact: George Stoica. As of Jan 2021, I am no longer at CMU, and the cs.cmu.edu email may no longer work. Please contact me instead at: [email protected].

Changelog

1.0 - Initial dataset release: Data consisted of 105,206 total instances spread across 40 relations.
1.1 - Updated dataset release: After extensive discussion, we have elected to prune Re-TACRED by ~ 14K instances. The new dataset has 91,467 instances, spread across 40 relations. Pruned data consisted of a mixture of messily segmented entities (and corresponding types), or sentences whose relations were ambigious. While this version is smaller, it is cleaner, and better defined.

This repository contains all relevant resources for using Re-TACRED, a new relation extraction dataset.

For details on this work please check out our:

arXiv: Paper
AAAI 2021: Paper & Poster
NeurIPS 2020 KR2ML Workshop: Paper & Poster

Below we describe the contents of the four repository directories by name.

Re-TACRED

This directory contains version 1.1 of our revised TACRED dataset patches for each split. Due to licensing restrictions, we cannot provide the complete dataset. However, following Alt, Gabryszak, and Hennig (2020), our patch consists of json files mapping TACRED instances by their id to our revised labels.

The original TACRED dataset is available for download from the LDC here. It is free for members, or $25 for non-members.

Applying the patch is simple and only requires replacing each TACRED instance (where applicable) with our revised relation. For convenience, we provide a script for this named apply_patch.py in the Re-TACRED directory. In the script, you only need to replace

tacred_dir = None
save_dir = None

With the path to your TACRED dataset save directory, and the directory where you wish to save the patched data to respectively.

PA-LSTM, C-GCN & SpanBERT

We base our experiments off of the open-source model repositories of:

PA-LSTM: Zhang et. al. (2017)
C-GCN: Zhang et. al. (2018)
SpanBERT: Joshi et. al. (2019)

However, it is not possible to simply pass Re-TACRED to each model repository because each is hardcoded for TACRED. Thus, we must modify certain files to make each model Re-TACRED compatible. To make it as easy as possible, we provide all our altered files in each named model directory (e.g., the provided PA-LSTM directory). All that needs to be done is to replace the corresponding file in our provided directory with the corresponding file in the original model repository. For instance, you may replace SpanBERT's "run_tacred.py" file with our "run_tacred.py" file. Running experiments is equivalent to how it is performed in the original model repositories.

Note that our files also contain certain "quality of life" changes that make running each model more convenient for us. Examples include adding and tracking the test split while training (as opposed to only the dev set).

Re-TACRED: Addressing Shortcomings of the TACRED Dataset

Related tags

Overview

Re-TACRED

Owner

George Stoica

Implementation for the EMNLP 2021 paper "Interactive Machine Comprehension with Dynamic Knowledge Graphs".

FAVD: Featherweight Assisted Vulnerability Discovery

This repo generates the training data and the model for Morpheus-Deblend

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

Symbolic Music Generation with Diffusion Models

Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Concept drift monitoring for HA model servers.

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

PyTorch implementation for 3D human pose estimation

[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

Deep Reinforcement Learning for Multiplayer Online Battle Arena

implementation for paper "ShelfNet for fast semantic segmentation"

This is the implementation of GGHL (A General Gaussian Heatmap Labeling for Arbitrary-Oriented Object Detection)

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

🎓Automatically Update CV Papers Daily using Github Actions (Update at 12:00 UTC Every Day)

This repository contains implementations and illustrative code to accompany DeepMind publications

A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

This source code is implemented using keras library based on "Automatic ocular artifacts removal in EEG using deep learning"

Algorithmic trading with deep learning experiments