CBKH: The Cornell Biomedical Knowledge Hub

Related tags

Deep LearningCBKH
Overview

Cornell Biomedical Knowledge Hub (CBKH)

CBKG integrates data from 18 publicly available biomedical databases. The current version of CBKG contains a total of 2,932,164 entities of 10 types. Specifically, the CBKH includes 22,963 anatomy entities, 18,774 disease entities, 36,522 drug entities, 87,942 gene entities, 2,065,015 molecule entities, 1,361 symptom entities, 4,101 DSI entities, 137,568 DSP entities, 605 TC entities and 2,970 pathway entities. For the relationships in the CBKG (Table 3), there are 100 relation types within 17 kinds of entity pairs, including Anatomy-Gene, Drug-Disease, Drug-Drug, Drug-Gene, Disease-Disease, Disease-Gene, Disease-Symptom, Gene-Gene, DSI-Disease, DSI-Symptom, DSI-Drug, DSI-Anatomy, DSI-DSP, DSI-TC, Disease-Pathway, Drug-Pathway and Gene-Pathway. In total, CBKH contains 49,541,938 relations.

Schema

Materials and Methods

Our ultimate goal was to build a biomedical knowledge graph via comprehensively incorporating biomedical knowledge as much as possible. To this end, we collected and integrated 18 publicly available data sources to curate a comprehensive one. Details of the used data resources were listed in Table.

Statistics of CBKH

Entity Type Number Included Identifiers
Anatomy 22,963 Uberon ID, BTO ID, MeSH ID, Cell Ontology ID
Disease 18,774 Disease Ontology ID, KEGG ID, PharmGKB ID, MeSH ID, OMIM ID
Drug 36,759 DrugBank ID, KEGG ID, PharmGKB ID, MeSH ID
Gene 87,942 HGNC ID, NCBI ID, PharmGKB ID
Molecule 2,065,015 CHEMBL ID, CHEBI ID
Symptom 1,361 MeSH ID
Dietary Supplement Ingredient 4,101 iDISK ID
Dietary Supplement Product 137,568 iDISK ID
Therapeutic Class 605 iDISK ID, UMLS CUI
Pathway 2,970 Reactome ID, KEGG ID
Total Entities 2,382,309 -
Relation Type Number
Anatomy-Gene 12,825,270
Drug-Disease 2,711,848
Drug-Drug 2,684,682
Drug-Gene 1,295,088
Disease-Disease 11,072
Disease-Gene 27,541,618
Disease-Symptom 3,357
Gene-Gene 1,605,716
DSI-Symptom 2,093
DSI-Disease 5,134
DSI-Anatomy 4,334
DSP-DSI 689,297
DSI-TC 5,430
Disease-Pathway 1,942
Drug-Pathway 3,231
Gene-Pathway 153,236
Drug-Side Effect 163,206
Total Relations 49,706,554

Licence

The data of CBKG is licensed under the MIT License. The CBKH integrated the data from many resources, and users should consider the licenses for each of them (see the detail in the table).

Cite

@article{su2021cbkh,
  title={CBKH: The Cornell Biomedical Knowledge Hub},
  author={Su, Chang and Hou, Yu and Guo, Winston and Chaudhry, Fayzan and Ghahramani, Gregory and Zhang, Haotan and Wang, Fei},
  journal={medRxiv},
  year={2021},
  publisher={Cold Spring Harbor Laboratory Press},
  url = {https://www.medrxiv.org/content/10.1101/2021.03.12.21253461v1}
}
Curating a dataset for bioimage transfer learning

CytoImageNet A large-scale pretraining dataset for bioimage transfer learning. Motivation In past few decades, the increase in speed of data collectio

Stanley Z. Hua 9 Jun 20, 2022
Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

Miloš Stanojević 11 Oct 14, 2022
Jupyter notebooks for the code samples of the book "Deep Learning with Python"

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

François Chollet 16.2k Dec 30, 2022
Implementation of DropLoss for Long-Tail Instance Segmentation in Pytorch

[AAAI 2021]DropLoss for Long-Tail Instance Segmentation [AAAI 2021] DropLoss for Long-Tail Instance Segmentation Ting-I Hsieh*, Esther Robb*, Hwann-Tz

Tim 37 Dec 02, 2022
audioLIME: Listenable Explanations Using Source Separation

audioLIME This repository contains the Python package audioLIME, a tool for creating listenable explanations for machine learning models in music info

Institute of Computational Perception 27 Dec 01, 2022
Generalized Data Weighting via Class-level Gradient Manipulation

Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas

18 Nov 12, 2022
Python implementation of MULTIseq barcode alignment using fuzzy string matching and GMM barcode assignment

Python implementation of MULTIseq barcode alignment using fuzzy string matching and GMM barcode assignment.

MT Schmitz 2 Feb 11, 2022
Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

AniFormer This is the PyTorch implementation of our BMVC 2021 paper AniFormer: Data-driven 3D Animation with Transformer. Haoyu Chen, Hao Tang, Nicu S

24 Nov 02, 2022
AutoML library for deep learning

Official Website: autokeras.com AutoKeras: An AutoML system based on Keras. It is developed by DATA Lab at Texas A&M University. The goal of AutoKeras

Keras 8.7k Jan 08, 2023
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
CVPRW 2021: How to calibrate your event camera

E2Calib: How to Calibrate Your Event Camera This repository contains code that implements video reconstruction from event data for calibration as desc

Robotics and Perception Group 104 Nov 16, 2022
MIM: MIM Installs OpenMMLab Packages

MIM provides a unified API for launching and installing OpenMMLab projects and their extensions, and managing the OpenMMLab model zoo.

OpenMMLab 254 Jan 04, 2023
SimDeblur is a simple framework for image and video deblurring, implemented by PyTorch

SimDeblur (Simple Deblurring) is an open source framework for image and video deblurring toolbox based on PyTorch, which contains most deep-learning based state-of-the-art deblurring algorithms. It i

220 Jan 07, 2023
Dogs classification with Deep Metric Learning using some popular losses

Tsinghua Dogs classification with Deep Metric Learning 1. Introduction Tsinghua Dogs dataset Tsinghua Dogs is a fine-grained classification dataset fo

QuocThangNguyen 45 Nov 09, 2022
PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment

logit-adj-pytorch PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment This code implements the paper: Long-tail Learning via

Chamuditha Jayanga 53 Dec 23, 2022
Python Interview Questions

Python Interview Questions Clone the code to your computer. You need to understand the code in main.py and modify the content in if __name__ =='__main

ClassmateLin 575 Dec 28, 2022
A Python library for differentiable optimal control on accelerators.

A Python library for differentiable optimal control on accelerators.

Google 80 Dec 21, 2022
HIVE: Evaluating the Human Interpretability of Visual Explanations

HIVE: Evaluating the Human Interpretability of Visual Explanations Project Page | Paper This repo provides the code for HIVE, a human evaluation frame

Princeton Visual AI Lab 16 Dec 13, 2022
A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

InstaDeep Ltd 463 Dec 23, 2022
POCO: Point Convolution for Surface Reconstruction

POCO: Point Convolution for Surface Reconstruction by: Alexandre Boulch and Renaud Marlet Abstract Implicit neural networks have been successfully use

valeo.ai 93 Dec 29, 2022