Natural Language Processing for Adverse Drug Reaction (ADR) Detection

This repo contains code from a project to identify ADRs in discharge summaries at Austin Health. The model uses the HuggingFace Transformers library, beginning with the pretrained DeBERTa model. Further MLM pre-training is performed on a large corpus of unannotated discharge summaries. Finally, fine-tuning is peformed on a corpus of annotated discharge summaries (annotated using Prodigy). The model performs NER, but final performance is measured at the document level using the maximum token-level score.

We used Weights and Biases for experiment tracking.

The pretrain script takes a folder containing discharge summaries stored in CSV folders, tokenizes and continues MLM training on deberta-base.

Fine-tuning can then be performed with the finetune script using CLI commands. This script assumes the data is either a JSONL file of annotated text exported from Prodigy (--datafile example.jsonl), or a saved HuggingFace Datasets. If you run this script once on a JSONL file of annotations, you can choose to save the Dataset into a folder (--save_data_dir "save_to_here") and use this for subsequent training runs (--datafile "save_to_here").

Example usage:

python .\finetune.py --folds 5 --epochs 15 --lr 5e-5 --wandb_on --hub_off --project 'CLI Tests' --run_name cross-validation --datafile 'data'

Note: you might find that your exported annotations (JSONL file) is not encoded using UTF-8, which will prevent this code from working. There are various methods to change the encoding and these can all be found with a quick Google search. On a windows machine, for example, modify the following in powershell:

Get-Content .\name_of_file.jsonl -Encoding Unicode | Set-Content -Encoding UTF8 .\name_of_new_file.jsonl

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Related tags

Overview

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Owner

Medicines Optimisation Service - Austin Health

DiY Oxygen Concentrator based on the OxiKit

Multilingual word vectors in 78 languages

Index different CKAN entities in Solr, not just datasets

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

Conversational text Analysis using various NLP techniques

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

ASCEND Chinese-English code-switching dataset

Question answering app is used to answer for a user given question from user given text.

Watson Natural Language Understanding and Knowledge Studio

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Material for GW4SHM workshop, 16/03/2022.

मराठी भाषा वाचविण्याचा एक प्रयास. इंग्रजी ते मराठीचा शब्दकोश. An attempt to preserve the Marathi language. A lightweight and ad free English to Marathi thesaurus.

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

Text-to-Speech for Belarusian language