Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.

Overview

Nanopore-Workflow

Snakemake workflow to process and filter long read data from Oxford Nanopore Technologies. It is designed to compare whole human genome tumor/normal pairs, but can also run individual samples. Reports and plots are generated for de novo genome assembly, differentially methylated regions, copy number variants, and structural variants. Filtering heuristics typically reduce the reported translocations to the break points. It is suggested to have at least 15x - 20x of coverage, and a median read length of at least 5kbp - 6kbp.

nanopore_workflow

Installation instructions

Download the latest code from GitHub:

git clone https://github.com/mike-molnar/nanopore-workflow.git

Before running the workflow, you will need to download the reference genome. I have not included the download as part of the workflow because it is designed to run a cluster that may not have internet access. You can use a local copy of GRCh38 if you have one, but the chromosomes must be named chr1, chr2, ... , and the reference can only contain the autosomes and sex chromosomes. To download the reference genome and index it, change to the reference directory of the workflow and run the script:

cd /path/to/nanopore-workflow/reference
chmod u+x download_reference.sh
./download_reference.sh

To run the workflow copy the Snakefile and config.yaml files to the directory that you want to run the workflow:

cp /path/to/nanopore-workflow/Snakefile /path/to/nanopore-workflow/config.yaml /path/to/samples

Modify the config.yaml file to represent the information for the necessary files and directories of your sample(s). The workflow is currently designed to have a single FASTQ, and a single sequencing summary file in a folder named fastq that is in a folder named after the sample. The config.yaml file provides an example of how to format the initial files and directories before running the workflow.

To run on a grid engine

There are a few different grid engines, so the exact format may be different for your particular grid engine. To run everything except the de novo assembly on a Univa grid engine:

snakemake --jobs 500 --rerun-incomplete --keep-going --latency-wait 30 --cluster "qsub -cwd -V -o snakemake.output.log -e snakemake.error.log -q queue_name -P project_name -pe smp {threads} -l h_vmem={params.memory_per_thread} -l h_rt={params.run_time} -b y" all_but_assembly

You will have to replace queue_name and project_name with the necessary values to run on your grid.

Dependencies

There are many dependencies, so it is best to create a new Conda environment using the YAML files in the env directory. There is a YAML file for the workflow, and another for Medaka. You will need to install a separate environment for QUAST if you are going to run the de novo assembly portion of the workflow. Change to the env directory and create the environments with Conda:

cd /path/to/nanopore-worflow/env
conda env create -n nanopore-workflow -f nanopore-workflow_env.yml
conda env create -n medaka -f medaka_env.yml
conda env create -n quast -f quast_env.yml
conda env create -n R_env -f R_env.yml
conda activate nanopore-workflow

Before running the workflow you will need to export the paths of the four environments to your PATH variable:

export PATH="/path/to/conda/envs/nanopore-workflow/bin:$PATH"
export PATH="/path/to/conda/envs/medaka/bin:$PATH"
export PATH="/path/to/conda/envs/quast/bin:$PATH"
export PATH="/path/to/conda/envs/R_env/bin:$PATH"

nanopore-workflow dependencies:

  • bcftools
  • bedtools
  • cutesv
  • flye
  • longshot
  • nanofilt v2.8.0
  • nanoplot v1.20.0
  • nanopolish
  • seaborn v0.10.0
  • snakemake
  • sniffles
  • survivor
  • svim
  • whatshap
  • winnowmap

R_env dependencies:

  • bioconductor-karyoploter
  • bioconductor-txdb.hsapiens.ucsc.hg38.knowngene
  • bioconductor-org.hs.eg.db
  • bioconductor-dss
  • r-tidyverse
You might also like...
 A simple way to read and write LAPS passwords from linux.
A simple way to read and write LAPS passwords from linux.

A simple way to read and write LAPS passwords from linux. This script is a python setter/getter for property ms-Mcs-AdmPwd used by LAPS inspired by @s

 ⚙️ Compile, Read and update your .conf file in python
⚙️ Compile, Read and update your .conf file in python

⚙️ Compile, Read and update your .conf file in python

Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

Users can read others' travel journeys in addition to being able to upload and delete posts detailing their own experiences

Users can read others' travel journeys in addition to being able to upload and delete posts detailing their own experiences! Posts are organized by country and destination within that country.

To lazy to read your homework ? Get it done with LOL

LOL To lazy to read your homework ? Get it done with LOL Needs python 3.x L:::::::::L OO:::::::::OO L:::::::::L L:::::::

Pequenos programas variados que estou praticando e implementando, leia o Read.me!

my-small-programs Pequenos programas variados que estou praticando e implementando! Arquivo: automacao Automacao de processos de rotina com código Pyt

Show my read on kindle this year

Show my kindle status on GitHub

Incident Response Process and Playbooks | Goal: Playbooks to be Mapped to MITRE Attack Techniques
Incident Response Process and Playbooks | Goal: Playbooks to be Mapped to MITRE Attack Techniques

PURPOSE OF PROJECT That this project will be created by the SOC/Incident Response Community Develop a Catalog of Incident Response Playbook for every

These are After Effects and Python files that were made in the process of creating the video for the contest.

spirograph These are After Effects and Python files that were made in the process of creating the video for the contest. In the python file you can qu

Releases(v0.1.0)
A bash-like intrepreted language

A Bash-like interpreted scripting language.

AshVXmc 1 Oct 28, 2021
py-js: python3 objects for max

Simple (and extensible) python3 externals for MaxMSP

Shakeeb Alireza 39 Nov 20, 2022
An ongoing curated list of frameworks, libraries, learning tutorials, software and resources in Python Language.

Python Development Welcome to the world of Python. An ongoing curated list of frameworks, libraries, learning tutorials, software and resources in Pyt

Paul Veillard 2 Dec 24, 2021
Spooky Castle Project

Spooky Castle Project Here is a repository where I have placed a few workflow scripts that could be used to automate the blender to godot sprite pipel

3 Jan 17, 2022
Wordless - the #1 app for helping you cheat at Wordle, which is sure to make you popular at parties

Wordless Wordless is the #1 app for helping you cheat at Wordle, which is sure t

James Kirk 7 Feb 04, 2022
A script to add issues to a project in Github based on label or status.

Add Github Issues to Project (Beta) A python script to move Github issues to a next-gen (beta) Github Project Getting Started These instructions will

Kate Donaldson 3 Jan 16, 2022
:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

pre-commit-dbt List of pre-commit hooks to ensure the quality of your dbt projects. BETA NOTICE: This tool is still BETA and may have some bugs, so pl

Offbi 262 Nov 25, 2022
SysCFG R/W Utility written in Swift

MagicCFG SysCFG R/W Utility written in Swift MagicCFG is one of our first, successful applications that we launched last year. The app makes it possib

Jan Fabel 82 Aug 08, 2022
A prototype COG-based tile server for sparse Mars datasets

Mars tiler Mars Tiler is a prototype web application that serves tiles from cloud-optimized GeoTIFFs, with an emphasis on supporting planetary dataset

Daven Quinn 3 Mar 23, 2022
A Company Management System For Python

campany-management Getting started To make it easy for you to get started with GitLab, here's a list of recommended next steps. Already a pro? Just ed

hatice akpınar 3 Aug 29, 2022
Modelling and Implementation of Cable Driven Parallel Manipulator System with Tension Control

Cable Driven Parallel Robots (CDPR) is also known as Cable-Suspended Robots are the emerging and flexible end effector manipulation system. Cable-driven parallel robots (CDPRs) are categorized as a t

Siddharth U 0 Jul 19, 2022
Kellogg bad | Union good | Support strike funds

KelloggBot Credit to SeanDaBlack for the basis of the script. req.py is selenium python bot. sc.js is a the base of the ios shortcut [COMING SOON] Set

407 Nov 17, 2022
Show Public IP Information In Linux Taskbar

IP Information In Linux Taskbar 📍 How Use IP Script? 🤔 Download ip.py script and save somewhere in your system. Add command applet in your taskbar a

HOP 2 Jan 25, 2022
The code for 2021 MGTV AI Challenge Anti Stealing Link, and the online result ranks 10th.

赛题介绍 芒果TV-第二届“马栏山杯”国际音视频算法大赛-防盗链 随着业务的发展,芒果的视频内容也深受网友的喜欢,不少视频网站和应用开始盗播芒果的视频内容,盗链网站不经过芒果TV的前端系统,跳过广告播放,且消耗大量的服务器、带宽资源,直接给公司带来了巨大的经济损失,因此防盗链在日常运营中显得尤为重要

tongji40 16 Jun 17, 2022
Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Feature engineering library that helps you keep track of feature dependencies, documentation and schema

28 May 31, 2022
Install Firefox from Mozilla.org easily, complete with .desktop file creation.

firefox-installer Install Firefox from Mozilla.org easily, complete with .desktop file creation. Dependencies Python 3 Python LXML Debian/Ubuntu: sudo

rany 7 Nov 04, 2022
En este repositorio pondré archivos graciositos de python que hago de vez en cuando

🐍 Apuntes de python 🐍 ¿Quién soy? 👽 Saludos,mi nombre es Carlos Lara. Pero mi nickname en internet es Hercules Kan. Soy un programador autodidacta

Carlos E. Lara 3 Nov 16, 2021
KeyBrowser: A program launches a browser and a keylogger at the same time, is used to retrieve a person's personal information

KeyBrowser: A program launches a browser and a keylogger at the same time, is used to retrieve a person's personal information

3 Oct 16, 2022
A tool for light-duty persistent memoization of API calls

JSON Memoize What is this? json_memoize is a straightforward tool for light-duty persistent memoization, created with API calls in mind. It stores the

1 Dec 11, 2021
Removes all archived super productivity tasks. Just run the python script.

delete-archived-sp-tasks.py Removes all archived super productivity tasks. Just run the python script. This is helpful to do a cleanup every 3-6 month

Ben Herbst 1 Jan 09, 2022