Mednlp - Medical natural language parsing and utility library

Last update: Aug 24, 2022

Overview

Medical natural language parsing and utility library

A natural language medical domain parsing library. This library:

Provides an interface to the UTS (UMLS Terminology Services) RESTful service with data caching (NIH login needed).
Wraps the MedCAT library by parsing medical and clinical text into first class Python objects reflecting the structure of the natural language complete with UMLS entity linking with CUIs and other domain specific features.
Combines non-medical (such as POS and NER tags) and medical features (such as CUIs) in one API and resulting data structure and/or as a Pandas data frame.
Provides cui2vec as a word embedding model for either fast indexing and access or to use directly as features in a Zensols Deep NLP embedding layer model.
Provides access to cTAKES using as a dictionary like Stash abstraction.
Includes a command line program to access all of these features without having to write any code.

Documentation

See the full documentation. The API reference is also available.

Obtaining

The easiest way to install the command line program is via the pip installer:

pip3 install zensols.mednlp

Binaries are also available on pypi.

If the cui2vec functionality is used, the Zensols Deep NLP library is also needed, which is stalled with pip install zensols.deepnlp.

Attribution

This API utilizes the following frameworks:

MedCAT: used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS.
cTAKES: a natural language processing system for extraction of information from electronic medical record clinical free-text.
cui2vec: a new set of (like word) embeddings for medical concepts learned using an extremely large collection of multimodal medical data.
Zensols Deep NLP library: a deep learning utility library for natural language processing that aids in feature engineering and embedding layers.
ctakes-parser: parses cTAKES output in to a Pandas data frame.

Citation

If you use this project in your research please use the following BibTeX entry:

@article{Landes_DiEugenio_Caragea_2021,
  title={DeepZensols: Deep Natural Language Processing Framework},
  url={http://arxiv.org/abs/2109.03383},
  note={arXiv: 2109.03383},
  journal={arXiv:2109.03383 [cs]},
  author={Landes, Paul and Di Eugenio, Barbara and Caragea, Cornelia},
  year={2021},
  month={Sep}
}

Community

Please star the project and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome.

Changelog

An extensive changelog is available here.

License

MIT License

Mednlp - Medical natural language parsing and utility library

Related tags

Overview

Medical natural language parsing and utility library

Documentation

Obtaining

Attribution

Citation

Community

Changelog

License

Owner

Paul Landes

Deep Learning for Natural Language Processing - Lectures 2021

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

Datasets of Automatic Keyphrase Extraction

A NLP program: tokenize method, PoS Tagging with deep learning

Levenshtein and Hamming distance computation

Subtitle Workshop (subshop): tools to download and synchronize subtitles

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

Crowd sourced training data for Rasa NLU models

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Two-stage text summarization with BERT and BART

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Generating new names based on trends in data using GPT2 (Transformer network)

Simple program that translates the name of files into English

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Utilities for preprocessing text for deep learning with Keras

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

DeLighT: Very Deep and Light-Weight Transformers

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。