Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Overview

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

This repo contains code from a project to identify ADRs in discharge summaries at Austin Health. The model uses the HuggingFace Transformers library, beginning with the pretrained DeBERTa model. Further MLM pre-training is performed on a large corpus of unannotated discharge summaries. Finally, fine-tuning is peformed on a corpus of annotated discharge summaries (annotated using Prodigy). The model performs NER, but final performance is measured at the document level using the maximum token-level score.

We used Weights and Biases for experiment tracking.

The pretrain script takes a folder containing discharge summaries stored in CSV folders, tokenizes and continues MLM training on deberta-base.

Fine-tuning can then be performed with the finetune script using CLI commands. This script assumes the data is either a JSONL file of annotated text exported from Prodigy (--datafile example.jsonl), or a saved HuggingFace Datasets. If you run this script once on a JSONL file of annotations, you can choose to save the Dataset into a folder (--save_data_dir "save_to_here") and use this for subsequent training runs (--datafile "save_to_here").

Example usage:

python .\finetune.py --folds 5 --epochs 15 --lr 5e-5 --wandb_on --hub_off --project 'CLI Tests' --run_name cross-validation --datafile 'data'

Note: you might find that your exported annotations (JSONL file) is not encoded using UTF-8, which will prevent this code from working. There are various methods to change the encoding and these can all be found with a quick Google search. On a windows machine, for example, modify the following in powershell:

Get-Content .\name_of_file.jsonl -Encoding Unicode | Set-Content -Encoding UTF8 .\name_of_new_file.jsonl
Owner
Medicines Optimisation Service - Austin Health
Medicines Optimisation Service - Austin Health
DiY Oxygen Concentrator based on the OxiKit

M19O2 DiY Oxygen Concentrator based on / inspired by the OxiKit, OpenOx, Marut, RepRap and Project Apollo platforms. About Read about the project on H

Maker's Asylum 62 Dec 22, 2022
Multilingual word vectors in 78 languages

Aligning the fastText vectors of 78 languages Facebook recently open-sourced word vectors in 89 languages. However these vectors are monolingual; mean

Babylon Health 1.2k Dec 17, 2022
Index different CKAN entities in Solr, not just datasets

ckanext-sitesearch Index different CKAN entities in Solr, not just datasets Requirements This extension requires CKAN 2.9 or higher and Python 3 Featu

Open Knowledge Foundation 3 Dec 02, 2022
ЁЯТе Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 6.2k Dec 31, 2022
Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

Calibre Recipe f├╝r "Analyse & Kritik" Dies ist ein "Recipe" f├╝r die Konvertierung der aktuellen Ausgabe der Zeitung Analyse & Kritik in ein Ebook. Es

Henning 3 Jan 04, 2022
Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

ITTR - Pytorch Implementation of the Hybrid Perception Block (HPB) and Dual-Pruned Self-Attention (DPSA) block from the ITTR paper for Image to Image

Phil Wang 17 Dec 23, 2022
Conversational text Analysis using various NLP techniques

Conversational text Analysis using various NLP techniques

Rita Anjana 159 Jan 06, 2023
The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Neural Machine Translation communication system The model is basically direct to convert one source language to another targeted language using encode

Nishant Banjade 7 Sep 22, 2022
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

UIS-RNN Overview This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm. UIS-RNN solves the problem of s

Google 1.4k Dec 28, 2022
ASCEND Chinese-English code-switching dataset

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong.

CAiRE 11 Dec 09, 2022
Question answering app is used to answer for a user given question from user given text.

Question answering app is used to answer for a user given question from user given text.It is created using HuggingFace's transformer pipeline and streamlit python packages.

Siva Prakash 3 Apr 05, 2022
Watson Natural Language Understanding and Knowledge Studio

Material de demonstra├з├гo dos servi├зos: Watson Natural Language Understanding e Knowledge Studio Vis├гo Geral: https://www.ibm.com/br-pt/cloud/watson-na

Vanderlei Munhoz 4 Oct 24, 2021
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 1.1k Dec 27, 2022
Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Proquabet Turn your prose into a constant stream of encrypted and meaningless-so

Milo Fultz 2 Oct 10, 2022
Material for GW4SHM workshop, 16/03/2022.

GW4SHM Workshop Wednesday, 16th March 2022 (13:00 тАУ 15:15 GMT): Presented by: Dr. Rhodri Nelson, Imperial College London Project website: https://www.

Devito Codes 1 Mar 16, 2022
рдорд░рд╛рдареА рднрд╛рд╖рд╛ рд╡рд╛рдЪрд╡рд┐рдгреНрдпрд╛рдЪрд╛ рдПрдХ рдкреНрд░рдпрд╛рд╕. рдЗрдВрдЧреНрд░рдЬреА рддреЗ рдорд░рд╛рдареАрдЪрд╛ рд╢рдмреНрджрдХреЛрд╢. An attempt to preserve the Marathi language. A lightweight and ad free English to Marathi thesaurus.

For English, scroll down рдорд░рд╛рдареА рд╢рдмреНрдж рдорд░рд╛рдареА рднрд╛рд╖рд╛ рд╡рд╛рдЪрд╡рдгреНрдпрд╛рд╕рд╛рдареА рдореА рд╣рд╛ рдУрдкрди рд╕реЛрд░реНрд╕ рдкреНрд░реЛрдЬреЗрдХреНрдЯ рд╕реБрд░реВ рдХреЗрд▓рд╛ рдЖрд╣реЗ. рдорд╛рдЭреНрдпрд╛ рдорддреЗ, рдЖрдкрд▓реА рднрд╛рд╖рд╛ рд╣рд│реВрд╣рд│реВ рдЖрдгрд┐ рдХреЛрдгрд╛рдЪрд╛рд╣реА рд▓рдХреНрд╖рд╛рдд

рдореБрдХреНрдд рд╕реНрддреНрд░реЛрдд 20 Oct 11, 2022
Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Sentiment Analysis Project This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The

Simran Farrukh 0 Mar 28, 2022
Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

TestRank in Pytorch Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks by Yu Li, Min Li, Qiuxia Lai, Ya

3 May 19, 2022
Text-to-Speech for Belarusian language

title emoji colorFrom colorTo sdk app_file pinned Belarusian TTS ЁЯР╕ green green gradio app.py false Belarusian TTS ЁЯУв ЁЯдЦ Belarusian TTS (text-to-speec

Yurii Paniv 1 Nov 27, 2021