Resources for our AAAI 2022 paper: "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

Last update: Dec 27, 2022

Overview

LOREN

Resources for our AAAI 2022 paper (pre-print): "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

DEMO System

Check out our demo system! Note that the results will be slightly different from the paper, since we use an up-to-date Wikipedia as the evidence source whereas FEVER uses Wikipedia dated 2017.

Dependencies

CUDA > 11
Prepare requirements: pip3 install -r requirements.txt.
- Also works for allennlp==2.3.0, transformers==4.5.1, torch==1.8.1.
Set environment variable $PJ_HOME: export PJ_HOME=/YOUR_PATH/LOREN/.

Download Pre-processed Data and Checkpoints

Pre-processed data at Google Drive. Unzip it and put them under LOREN/data/.
- Data for training a Seq2Seq MRC is at data/mrc_seq2seq_v5/.
- Data for training veracity prediction is at data/fact_checking/v5/*.json.
  - Note: dev.json uses ground truth evidence for validation, where eval.json uses predicted evidence for validation. This is consistent with the settings in KGAT.
- Evidence retrieval models are not required for training LOREN, since we directly adopt the retrieved evidence from KGAT, which is at data/fever/baked_data/ (using only during pre-processing).
- Original data is at data/fever/ (using only during pre-processing).
Pre-trained checkpoints at Huggingface Models. Unzip it and put them under LOREN/models/.
- Checkpoints for veracity prediciton are at models/fact_checking/.
- Checkpoint for generative MRC is at models/mrc_seq2seq/.
- Checkpoints for KGAT evidence retrieval models are at models/evidence_retrieval/ (not used in training, displayed only for the sake of completeness).

Training LOREN from Scratch

For quick training and inference with pre-processed data & pre-trained models, please go to Veracity Prediction.

First, go to LOREN/src/.

1 Building Local Premises from Scratch

1) Extract claim phrases and generate questions

You'll need to download three external models in this step, i.e., two models from AllenNLP in parsing_client/sentence_parser.py and a T5-based question generation model in qg_client/question_generator.py. Don't worry, they'll be automatically downloaded.

Run python3 pproc_client/pproc_questions.py --roles eval train val test
This generates cached json files:
- AG_PREFIX/answer.{role}.cache: extracted phrases are stored in the field answers.
- QG_PREFIX/question.{role}.cache: generated questions are stored in the field cloze_qs, generate_qs and questions (two types of questions concatenated).

2) Train Seq2Seq MRC

Prepare self-supervised MRC data (only for SUPPORTED claims)

Run python3 pproc_client/pproc_mrc.py -o LOREN/data/mrc_seq2seq_v5.
This generates files for Seq2Seq training in a HuggingFace style:
- data/mrc_seq2seq_v5/{role}.source: concatenated question and evidence text.
- data/mrc_seq2seq_v5/{role}.target: answer (claim phrase).

Training Seq2Seq

Go to mrc_client/seq2seq/, which is modified based on HuggingFace's examples.
Follow script/train.sh.
The best checkpoint will be saved in $output_dir (e.g., models/mrc_seq2seq/).
- Best checkpoints are decided by ROUGE score on dev set.

3) Run MRC for all questions and assemble local premises

Run python3 pproc_client/pproc_evidential.py --roles val train eval test -m PATH_TO_MRC_MODEL/.
This generates files:
- {role}.json: files for veracity prediction. Assembled local premises are stored in the field evidential_assembled.

4) Building NLI prior

Before training veracity prediction, we'll need a NLI prior from pre-trained NLI models, such as DeBERTa.

Run python3 pproc_client/pproc_nli_labels.py -i PATH_TO/{role}.json -m microsoft/deberta-large-mnli.
Mind the order! The predicted classes [Contradict, Neutral, Entailment] correspond to [REF, NEI, SUP], respectively.
This generates files:
- Adding a new field nli_labels to {role}.json.

2 Veracity Prediction

This part is rather easy (less pipelined :P). A good place to start if you want to skip the above pre-processing.

1) Training

Go to folder check_client/.
See what scripts/train_*.sh does.

2) Testing

Stay in folder check_client/
Run python3 fact_checker.py --params PARAMS_IN_THE_CODE
This generates files:
- results/*.predictions.jsonl

3) Evaluation

Go to folder eval_client/
For Label Accuracy and FEVER score: fever_scorer.py
For CulpA (turn on --verbose in testing): culpa.py

Citation

If you find our paper or resources useful to your research, please kindly cite our paper (pre-print, official published paper coming soon).

@misc{chen2021loren,
      title={LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification}, 
      author={Jiangjie Chen and Qiaoben Bao and Changzhi Sun and Xinbo Zhang and Jiaze Chen and Hao Zhou and Yanghua Xiao and Lei Li},
      year={2021},
      eprint={2012.13577},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Resources for our AAAI 2022 paper: "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

Related tags

Overview

LOREN

DEMO System

Dependencies

Download Pre-processed Data and Checkpoints

Training LOREN from Scratch

1 Building Local Premises from Scratch

1) Extract claim phrases and generate questions

2) Train Seq2Seq MRC

Prepare self-supervised MRC data (only for SUPPORTED claims)

Training Seq2Seq

3) Run MRC for all questions and assemble local premises

4) Building NLI prior

2 Veracity Prediction

1) Training

2) Testing

3) Evaluation

Citation

Owner

Jiangjie Chen

Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

Some methods for comparing network representations in deep learning and neuroscience.

Content shared at DS-OX Meetup

Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

Code and Resources for the Transformer Encoder Reasoning Network (TERN)

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

official implementation for the paper "Simplifying Graph Convolutional Networks"

Boundary-preserving Mask R-CNN (ECCV 2020)

Pytorch implementation for "Adversarial Robustness under Long-Tailed Distribution" (CVPR 2021 Oral)

Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

Release of the ConditionalQA dataset

Classification Modeling: Probability of Default

A 35mm camera, based on the Canonet G-III QL17 rangefinder, simulated in Python.

This is a five-step framework for the development of intrusion detection systems (IDS) using machine learning (ML) considering model realization, and performance evaluation.

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Python parser for DTED data.

A library for efficient similarity search and clustering of dense vectors.

'Aligned mixture of latent dynamical systems' (amLDS) for stimulus decoding probabilistic manifold alignment across animals. P. Herrero-Vidal et al. NeurIPS 2021 code.

Code for the paper titled "Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks" (NeurIPS 2021 Spotlight).