PubMed Mapper: A Python library that map PubMed XML to Python object

Last update: Dec 08, 2022

Related tags

Database Drivers pubmed-mapper

Overview

pubmed-mapper: A Python Library that map PubMed XML to Python object

中文文档

1. Philosophy

view UML

Programmatically access PubMed article is a common task for me. Luckily, with the help of eutils, we can access full article data in XML format. What I need is Python objects, not just XML strings, so pubmed-mapper was born.

2. Installation

pip install pubmed-mapper

3. Usage

3.1 use as library

3.1.1 parse a PubMed ID

from pubmed_mapper import Article


article = Article.parse_pmid('32329900')

# PubMed ID
print(article.pmid)  # 32329900

# ids
print(article.ids)  # [pubmed: 32329900, doi: 10.1111/jgs.16467]
print(article.ids[1].id_type)  # doi
print(article.ids[1].id_value)  # 10.1111/jgs.16467

# title
print(article.title)  # Associations of Coffee...

# abstract
print(article.abstract)  # <p><strong>Background: </strong>Coffee and tea...

# keywords
print(article.keywords)  # ['aging', 'coffee; diet; longevity', 'tea']

# MeSH headings
print(article.mesh_headings)  # ['Aged', 'Body Mass Index', '...']

# authors
print(article.authors)  # [hadyab AH Aladdin H, Manson JE JoAnn E, ...]
print(article.authors[0].last_name)  # Shadyab
print(article.authors[0].forename)  # Aladdin H
print(article.authors[0].initials)  # AH
print(article.authors[0].affiliation)  # Department of Family...

# journal
print(article.journal)  # Journal of the American Geriatrics Society
print(article.journal.issn)  # 1532-5415
print(article.journal.issn_type)  # Electronic
print(article.journal.title)  # Journal of the American Geriatrics Society
print(article.journal.abbr)  # J Am Geriatr Soc

# volume
print(article.volume)  # 68

# issue
print(article.issue)  # 9

# references
print(article.references)  # [n. 2013;129:643-659....]
print(article.references[0].citation)  # Lotfield E, Freedman ND...
print(article.references[0].ids)  # []

# pubdate
print(article.pubdate)  # 2020-09-01

3.1.2 parse a downloaded XML file

from lxml import etree
from pubmed_mapper import Article


infile = 'xxx.xml'
with open(infile) as fp:
    root = etree.parse(fp)


articles = []
for pubmed_article_element in root.xpath('/PubmedArticleSet/PubmedArticle'):
    article =  Article.parse_element(pubmed_article_element)
    articles.append(article)

3.2 use as command line software

3.2.1 parse PubMed ID

pubmed-mapper pmid -p 32329900

3.2.2 parse single PubMed XML file

pubmed-mapper file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

3.2.3 parse a directory who contains multiple PubMed XML files

pubmed-mapper directory -i data/ -o output/pubmed-mapper.jl

4. FAQs

4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?

Parse publication date is a hard work, until now pubmed-mapper can't parse all types of them. The types pubmed-mapper can be parsed and the parsed value are:

type	value
2021-03-13	2021-03-13
2021-03	2021-03-01
2021 Spring	2021-04-01
2021	2021-01-01
2021 Jan-Feb	2021-01-01
2021 Mar 13-15	2021-03-13
2021 Mar-2022 Jan	2021-03-01
2021-2022	2021-01-01
2021 Mar 13-Dec 15	2021-03-13
1976-1977 Winter	1976-01-01
1977-1978 Fall-Winter	1977-10-01

4.2 What is pubmed-mapper.log generated by pubmed-mapper?

pubmed-mapper.log is the default log file generate by pubmed-mapper, you can change the file by using --log-file options:

pubmed-mapper --log-file my-custom.log file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

You can go to this log file to find out more parsing details.

4.3 I want log detail message in my log file?

Using --log-level can log more detail message:

pubmed-mapper --log-file my-custom.log --log-level DEBUG file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

PubMed Mapper: A Python library that map PubMed XML to Python object

Related tags

Overview

pubmed-mapper: A Python Library that map PubMed XML to Python object

1. Philosophy

2. Installation

3. Usage

3.1 use as library

3.1.1 parse a PubMed ID

3.1.2 parse a downloaded XML file

3.2 use as command line software

3.2.1 parse PubMed ID

3.2.2 parse single PubMed XML file

3.2.3 parse a directory who contains multiple PubMed XML files

4. FAQs

4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?

4.2 What is pubmed-mapper.log generated by pubmed-mapper?

4.3 I want log detail message in my log file?

Owner

灵魂工具人

TileDB-Py is a Python interface to the TileDB Storage Engine.

A Telegram Bot to manage Redis Database.

Simplest SQL mapper in Python, probably

Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

A Python library for Cloudant and CouchDB

Python cluster client for the official redis cluster. Redis 3.0+.

SQL for Humans™

A Python DB-API and SQLAlchemy dialect to Google Spreasheets

asyncio (PEP 3156) Redis support

Monty, Mongo tinified. MongoDB implemented in Python !

A simple python package that perform SQL Server Source Control and Auto Deployment.

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

A database migrations tool for SQLAlchemy.

An asyncio compatible Redis driver, written purely in Python. This is really just a pet-project for me.

Pysolr — Python Solr client

sync/async MongoDB ODM, yes.

An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets

A library for python made by me,to make the use of MySQL easier and more pythonic

Python DBAPI simplified

Async database support for Python. 🗄