Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Last update: Dec 21, 2022

Overview

Evaluating the Factual Consistency of Abstractive Text Summarization

Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher

Introduction

Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents. We propose a weakly-supervised, model-based approach for verifying factual consistency and identifying conflicts between source documents and a generated summary. Training data is generated by applying a series of rule-based transformations to the sentences of source documents. The factual consistency model is then trained jointly for three tasks:

identify whether sentences remain factually consistent after transformation,
extract a span in the source documents to support the consistency prediction,
extract a span in the summary sentence that is inconsistent if one exists. Transferring this model to summaries generated by several state-of-the art models reveals that this highly scalable approach substantially outperforms previous models, including those trained with strong supervision using standard datasets for natural language inference and fact checking. Additionally, human evaluation shows that the auxiliary span extraction tasks provide useful assistance in the process of verifying factual consistency.

Paper link: https://arxiv.org/abs/1910.12840

Updates
Citation
License
Usage
Get Involved

Updates

1/27/2020

Updated manually annotated data files - fixed filepaths in misaligned examples.

Updated model checkpoint files - recomputed evaluation metrics for fixed examples.

Citation

@article{kryscinskiFactCC2019,
  author    = {Wojciech Kry{\'s}ci{\'n}ski and Bryan McCann and Caiming Xiong and Richard Socher},
  title     = {Evaluating the Factual Consistency of Abstractive Text Summarization},
  journal   = {arXiv preprint arXiv:1910.12840},
  year      = {2019},
}

License

The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

This software should not be used to promote or profit from violence, hate, and division, environmental destruction, abuse of human rights, or the destruction of people's physical and mental health.

Usage

Code repository uses Python 3. Prior to running any scripts please make sure to install required Python packages listed in the requirements.txt file.

Example call: pip3 install -r requirements.txt

Training and Evaluation Datasets

Generated training data can be found here.

Manually annotated validation and test data can be found here.

Both generated and manually annotated datasets require pairing with the original CNN/DailyMail articles.

To recreate the datasets follow the instructions:

Download CNN Stories and Daily Mail Stories from https://cs.nyu.edu/~kcho/DMQA/
Create a cnndm directory and unpack downloaded files into the directory
Download and unpack FactCC data (do not rename directory)
Run the pair_data.py script to pair the data with original articles

Example call:

python3 data_pairing/pair_data.py <dir-with-factcc-data> <dir-with-stories>

Generating Data

Synthetic training data can be generated using code available in the data_generation directory.

The data generation script expects the source documents input as one jsonl file, where each source document is embedded in a separate json object. The json object is required to contain an id key which stores an example id (uniqness is not required), and a text field that stores the text of the source document.

Certain transformations rely on NER tagging, thus for best results use source documents with original (proper) casing.

The following claim augmentations (transformations) are available:

backtranslation - Paraphrasing claim via backtranslation (requires Google Translate API key; costs apply)
pronoun_swap - Swapping a random pronoun in the claim
date_swap - Swapping random date/time found in the claim with one present in the source article
number_swap - Swapping random number found in the claim with one present in the source article
entity_swap - Swapping random entity name found in the claim with one present in the source article
negation - Negating meaning of the claim
noise - Injecting noise into the claim sentence

For a detailed description of available transformations please refer to Section 3.1 in the paper.

To authenticate with the Google Cloud API follow these instructions.

Example call:

python3 data_generation/create_data.py <source-data-file> [--augmentations list-of-augmentations]

Model Code

FactCC and FactCCX models can be trained or initialized from a checkpoint using code available in the modeling directory.

Quickstart training, fine-tuning, and evaluation scripts are shared in the scripts directory. Before use make sure to update *_PATH variables with appropriate, absolute paths.

To customize training or evaluation settings please refer to the flags in the run.py file.

To utilize Weights&Biases dashboards login to the service using the following command: wandb login <API KEY>.

Trained FactCC model checkpoint can be found here.

Trained FactCCX model checkpoint can be found here.

IMPORTANT: Due to data pre-processing, the first run of training or evaluation code on a large dataset can take up to a few hours before the actual procedure starts.

Running on other data

To run pretrained FactCC or FactCCX models on your data follow the instruction:

Download pre-trained model checkpoint, linked above
Prepare your data in jsonl format. Each example should be a separate json object with id, text, claim keys representing example id, source document, and claim sentence accordingly. Name file as data-dev.jsonl
Update corresponding *-eval.sh script

Get Involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!

Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Related tags

Overview

Evaluating the Factual Consistency of Abstractive Text Summarization

Introduction

Table of Contents

Updates

1/27/2020

Citation

License

Usage

Training and Evaluation Datasets

Generating Data

Model Code

Running on other data

Get Involved

Owner

Salesforce

This dlib-based facial login system

Real-time multi-object tracker using YOLO v5 and deep sort

Heart Arrhythmia Classification

Lightweight Cuda Renderer with Python Wrapper.

Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Hierarchical-Bayesian-Defense - Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical Variational Inference (Openreview)

Post-Training Quantization for Vision transformers.

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Face Mask Detector by live camera using tensorflow-keras, openCV and Python

Beyond imagenet attack (accepted by ICLR 2022) towards crafting adversarial examples for black-box domains.

Malware Env for OpenAI Gym

modelvshuman is a Python library to benchmark the gap between human and machine vision

Randomized Correspondence Algorithm for Structural Image Editing

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Explaining Deep Neural Networks - A comparison of different CAM methods based on an insect data set

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

这是一个mobilenet-yolov4-lite的库，把yolov4主干网络修改成了mobilenet，修改了Panet的卷积组成，使参数量大幅度缩小。