MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

Overview

Overview

    MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data. It is implemented using Python.

MHtyper workflow

Step1:

 Sequencing data was filtered by NanoFilt and aligment with minimap2

Step2:

 Files with both BED and pileup format are generated

Step3:

 Phasing with margin ; Correction with isONcorrect; Haplotype analysis by MHtyper ; Integrate the analysis results to get the final micro haplotype results

Python environment construction and required software installation

   conda create -n MHtyper 
   conda activate MHtyper
   conda config --add channels bioconda 
   conda config --add channels
   conda-forge conda install -y NanoFilt minimap2 samtools bedtools

isONcorrect & Margin installation

  isONcorrect: https://github.com/ksahlin/isONcorrect [1] Margin: https://github.com/UCSC-nanopore-cgl/margin

# isONcorrect installation

   git clone https://github.com/ksahlin/isONcorrect.git
   cd isONcorrect 
   ./isONcorrect

# Marin installation
# step1:
   sudo apt-get install git make gcc g++ autoconf zlib1g-dev libcurl4-openssl-dev libbz2-dev libhdf5-dev
   wget https://github.com/Kitware/CMake/releases/download/v3.14.4/cmake-3.14.4-Linux-x86_64.sh && sudo mkdir /opt/cmake &&
   sudo sh cmake-3.14.4-Linux-x86_64.sh --prefix=/opt/cmake --skip-license && sudo ln -s /opt/cmake/bin/cmake
   /usr/local/bin/cmake cmake --version

# step2: Check out the repository and submodules:

   git clone https://github.com/UCSC-nanopore-cgl/margin.git
   cd margin git submodule update --init

# step3: Make build directory:

   mkdir build cd build

# step4: Generate Makefile and run:

   cmake .. 
   make ./margin

MHtyper installation

   git clone https://github.com/willow2333/MHtyper.git
   cd MHtyper 
   python run.py --h
   
   usage: run.py [-h] [--fastqfiles FASTQFILES] [--reference REFERENCE] [--prefix PREFIX] [--truthvcf TRUTHVCF] [--marginpath MARGINPATH]

    optional arguments:
      -h, --help            show this help message and exit
      --fastqfiles FASTQFILES
                            The input *.fq.gz files.
      --reference REFERENCE
                            The path of your ref.
      --prefix PREFIX       The name of your Sample, default is "Test".
      --truthvcf TRUTHVCF   The truth variant files in your research.
      --marginpath MARGINPATH
                            The setup path of "Margin".

Illustration

1.Test

   cd ./Test
   python ../run.py --fastqfiles test.fq.gz --reference path/hg19.fa --prefix Test --truthvcf truthvcf.txt  --marginpath path/margin

2. The sites vcf files needed

 The snp-sites.txt that contained the information of samples must needed

3. Output

 The analysis results of microhaplotypes is in finalphase.txt

Citation

  1.Sahlin, K., Medvedev, P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun 12, 2 (2021). https://doi.org/10.1038/s41467-020-20340-8 Link.

Email: Yiping Hou ([email protected]), Zheng Wang ([email protected]), Liu Qin ([email protected])

Owner
willow
willow
Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations Created by Jiahao Pang, Duanshun Li, and Dong Tian from InterDigital In

InterDigital 21 Dec 29, 2022
DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy: A SpaCy NLP Pipeline for Danish DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Ar

Kenneth Enevoldsen 71 Jan 06, 2023
Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Yomichad is a Japanese pop-up dictionary that can display readings and English definitions of Japanese words, kanji, and optionally named entities. It is similar to yomichan, 10ten, and rikaikun in s

Jonas Belouadi 7 Nov 07, 2022
Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

japanese-ebook-analysis This aim of this project is to make analysing the contents of a japanese ebook easy and streamline the process for non-technic

Christoffer Aakre 14 Jul 23, 2022
A library for end-to-end learning of embedding index and retrieval model

Poeem Poeem is a library for efficient approximate nearest neighbor (ANN) search, which has been widely adopted in industrial recommendation, advertis

54 Dec 21, 2022
Voilà turns Jupyter notebooks into standalone web applications

Rendering of live Jupyter notebooks with interactive widgets. Introduction Voilà turns Jupyter notebooks into standalone web applications. Unlike the

Voilà Dashboards 4.5k Jan 03, 2023
Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your t

Ravn Tech, Inc. 166 Jan 07, 2023
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

Soohwan Kim 40 Sep 19, 2022
Training code for Korean multi-class sentiment analysis

KoSentimentAnalysis Bert implementation for the Korean multi-class sentiment analysis 왜 한국어 감정 다중분류 모델은 거의 없는 것일까?에서 시작된 프로젝트 Environment: Pytorch, Da

Donghoon Shin 3 Dec 02, 2022
Deduplication is the task to combine different representations of the same real world entity.

Deduplication is the task to combine different representations of the same real world entity. This package implements deduplication using active learning. Active learning allows for rapid training wi

63 Nov 17, 2022
CMeEE 数据集医学实体抽取

医学实体抽取_GlobalPointer_torch 介绍 思想来自于苏神 GlobalPointer,原始版本是基于keras实现的,模型结构实现参考现有 pytorch 复现代码【感谢!】,基于torch百分百复现苏神原始效果。 数据集 中文医学命名实体数据集 点这里申请,很简单,共包含九类医学

85 Dec 28, 2022
NLP Text Classification

多标签文本分类任务 近年来随着深度学习的发展,模型参数的数量飞速增长。为了训练这些参数,需要更大的数据集来避免过拟合。然而,对于大部分NLP任务来说,构建大规模的标注数据集非常困难(成本过高),特别是对于句法和语义相关的任务。相比之下,大规模的未标注语料库的构建则相对容易。为了利用这些数据,我们可以

Jason 1 Nov 11, 2021
The official implementation of "BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?, ACL 2021 main conference"

BERT is to NLP what AlexNet is to CV This is the official implementation of BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Iden

Asahi Ushio 20 Nov 03, 2022
An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

PMR computer tutorials on HMMs (2021-2022) This is a repository for computer tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a Univer

Vaidotas Šimkus 10 Dec 06, 2022
Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

Hans Alemão 4 Jul 20, 2022
MPNet: Masked and Permuted Pre-training for Language Understanding

MPNet MPNet: Masked and Permuted Pre-training for Language Understanding, by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, is a novel pre-tr

Microsoft 228 Nov 21, 2022
Leon is an open-source personal assistant who can live on your server.

Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal

Leon AI 11.7k Dec 30, 2022
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

Bunkai Bunkai is a sentence boundary (SB) disambiguation tool for Japanese texts. Quick Start $ pip install bunkai $ echo -e '宿を予約しました♪!まだ2ヶ月も先だけど。早すぎ

Megagon Labs 160 Dec 23, 2022
Search with BERT vectors in Solr and Elasticsearch

Search with BERT vectors in Solr and Elasticsearch

Dmitry Kan 123 Dec 29, 2022
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

CrossNER is a fully-labeled collected of named entity recognition (NER) data spanning over five diverse domains (Politics, Natural Science, Music, Literature, and Artificial Intelligence) with specia

Zihan Liu 89 Nov 10, 2022