PubMed Mapper: A Python library that map PubMed XML to Python object

Overview

pubmed-mapper: A Python Library that map PubMed XML to Python object

中文文档

1. Philosophy

view UML

Programmatically access PubMed article is a common task for me. Luckily, with the help of eutils, we can access full article data in XML format. What I need is Python objects, not just XML strings, so pubmed-mapper was born.

2. Installation

pip install pubmed-mapper

3. Usage

3.1 use as library

3.1.1 parse a PubMed ID

from pubmed_mapper import Article


article = Article.parse_pmid('32329900')

# PubMed ID
print(article.pmid)  # 32329900

# ids
print(article.ids)  # [pubmed: 32329900, doi: 10.1111/jgs.16467]
print(article.ids[1].id_type)  # doi
print(article.ids[1].id_value)  # 10.1111/jgs.16467

# title
print(article.title)  # Associations of Coffee...

# abstract
print(article.abstract)  # <p><strong>Background: </strong>Coffee and tea...

# keywords
print(article.keywords)  # ['aging', 'coffee; diet; longevity', 'tea']

# MeSH headings
print(article.mesh_headings)  # ['Aged', 'Body Mass Index', '...']

# authors
print(article.authors)  # [hadyab AH Aladdin H, Manson JE JoAnn E, ...]
print(article.authors[0].last_name)  # Shadyab
print(article.authors[0].forename)  # Aladdin H
print(article.authors[0].initials)  # AH
print(article.authors[0].affiliation)  # Department of Family...

# journal
print(article.journal)  # Journal of the American Geriatrics Society
print(article.journal.issn)  # 1532-5415
print(article.journal.issn_type)  # Electronic
print(article.journal.title)  # Journal of the American Geriatrics Society
print(article.journal.abbr)  # J Am Geriatr Soc

# volume
print(article.volume)  # 68

# issue
print(article.issue)  # 9

# references
print(article.references)  # [n. 2013;129:643-659....]
print(article.references[0].citation)  # Lotfield E, Freedman ND...
print(article.references[0].ids)  # []

# pubdate
print(article.pubdate)  # 2020-09-01

3.1.2 parse a downloaded XML file

from lxml import etree
from pubmed_mapper import Article


infile = 'xxx.xml'
with open(infile) as fp:
    root = etree.parse(fp)


articles = []
for pubmed_article_element in root.xpath('/PubmedArticleSet/PubmedArticle'):
    article =  Article.parse_element(pubmed_article_element)
    articles.append(article)

3.2 use as command line software

3.2.1 parse PubMed ID

pubmed-mapper pmid -p 32329900

3.2.2 parse single PubMed XML file

pubmed-mapper file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

3.2.3 parse a directory who contains multiple PubMed XML files

pubmed-mapper directory -i data/ -o output/pubmed-mapper.jl

4. FAQs

4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?

Parse publication date is a hard work, until now pubmed-mapper can't parse all types of them. The types pubmed-mapper can be parsed and the parsed value are:

type value
2021-03-13 2021-03-13
2021-03 2021-03-01
2021 Spring 2021-04-01
2021 2021-01-01
2021 Jan-Feb 2021-01-01
2021 Mar 13-15 2021-03-13
2021 Mar-2022 Jan 2021-03-01
2021-2022 2021-01-01
2021 Mar 13-Dec 15 2021-03-13
1976-1977 Winter 1976-01-01
1977-1978 Fall-Winter 1977-10-01

4.2 What is pubmed-mapper.log generated by pubmed-mapper?

pubmed-mapper.log is the default log file generate by pubmed-mapper, you can change the file by using --log-file options:

pubmed-mapper --log-file my-custom.log file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

You can go to this log file to find out more parsing details.

4.3 I want log detail message in my log file?

Using --log-level can log more detail message:

pubmed-mapper --log-file my-custom.log --log-level DEBUG file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl
Owner
灵魂工具人
大家好,我是灵魂工具人,我会分享一些由我做的生物信息工具,希望大家喜欢。
灵魂工具人
Create a database, insert data and easily select it with Sqlite

sqliteBasics create a database, insert data and easily select it with Sqlite Watch on YouTube a step by step tutorial explaining this code: https://yo

Mariya 27 Dec 27, 2022
This repository is for active development of the Azure SDK for Python.

Azure SDK for Python This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public

Microsoft Azure 3.4k Jan 02, 2023
Python PostgreSQL database performance insights. Locks, index usage, buffer cache hit ratios, vacuum stats and more.

Python PG Extras Python port of Heroku PG Extras with several additions and improvements. The goal of this project is to provide powerful insights int

Paweł Urbanek 35 Nov 01, 2022
Micro ODM for MongoDB

Beanie - is an asynchronous ODM for MongoDB, based on Motor and Pydantic. It uses an abstraction over Pydantic models and Motor collections to work wi

Roman 993 Jan 03, 2023
asyncio compatible driver for elasticsearch

asyncio client library for elasticsearch aioes is a asyncio compatible library for working with Elasticsearch The project is abandoned aioes is not su

97 Sep 05, 2022
Dinamopy is a python helper library for dynamodb

Dinamopy is a python helper library for dynamodb. You can define your access patterns in a json file and can use dynamic method names to make operations.

Rasim Andıran 2 Jul 18, 2022
A Python Object-Document-Mapper for working with MongoDB

MongoEngine Info: MongoEngine is an ORM-like layer on top of PyMongo. Repository: https://github.com/MongoEngine/mongoengine Author: Harry Marr (http:

MongoEngine 3.9k Jan 08, 2023
GINO Is Not ORM - a Python asyncio ORM on SQLAlchemy core.

GINO - GINO Is Not ORM - is a lightweight asynchronous ORM built on top of SQLAlchemy core for Python asyncio. GINO 1.0 supports only PostgreSQL with

GINO Community 2.5k Dec 29, 2022
Neo4j Bolt driver for Python

Neo4j Bolt Driver for Python This repository contains the official Neo4j driver for Python. Each driver release (from 4.0 upwards) is built specifical

Neo4j 762 Dec 30, 2022
Python ODBC bridge

pyodbc pyodbc is an open source Python module that makes accessing ODBC databases simple. It implements the DB API 2.0 specification but is packed wit

Michael Kleehammer 2.6k Dec 27, 2022
Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL.

cnpj-mysql Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL. Dados públicos de cnpj no site da Receit

17 Dec 25, 2022
Databank is an easy-to-use Python library for making raw SQL queries in a multi-threaded environment.

Databank Databank is an easy-to-use Python library for making raw SQL queries in a multi-threaded environment. No ORM, no frills. Thread-safe. Only ra

snapADDY GmbH 4 Apr 04, 2022
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Redash is designed to enable anyone, regardless of the level of technical sophistication, to harness the power of data big and small. SQL users levera

Redash 22.4k Dec 30, 2022
SAP HANA Connector in pure Python

SAP HANA Database Client for Python Important Notice This public repository is read-only and no longer maintained. The active maintained alternative i

SAP Archive 299 Nov 20, 2022
Async ODM (Object Document Mapper) for MongoDB based on python type hints

ODMantic Documentation: https://art049.github.io/odmantic/ Asynchronous ODM(Object Document Mapper) for MongoDB based on standard python type hints. I

Arthur Pastel 732 Dec 31, 2022
Implementing basic MongoDB CRUD (Create, Read, Update, Delete) queries, using Python.

MongoDB with Python Implementing basic MongoDB CRUD (Create, Read, Update, Delete) queries, using Python. We can connect to a MongoDB database hosted

MousamSingh 4 Dec 01, 2021
An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets

datasets_sql A 🤗 Datasets extension package that provides support for executing arbitrary SQL queries on HF datasets. It uses DuckDB as a SQL engine

Mario Šaško 19 Dec 15, 2022
Baserow is an open source no-code database tool and Airtable alternative

Baserow is an open source no-code database tool and Airtable alternative

1.3k Jan 01, 2023
A SQL linter and auto-formatter for Humans

The SQL Linter for Humans SQLFluff is a dialect-flexible and configurable SQL linter. Designed with ELT applications in mind, SQLFluff also works with

SQLFluff 5.5k Jan 08, 2023
A framework based on tornado for easier development, scaling up and maintenance

turbo 中文文档 Turbo is a framework for fast building web site and RESTFul api, based on tornado. Easily scale up and maintain Rapid development for RESTF

133 Dec 06, 2022