Toolchest provides APIs for scientific and bioinformatic data analysis.

Last update: Jun 30, 2022

Overview

Toolchest Python Client

Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of running tools on your own resources by running the same jobs on secure, powerful remote servers.

This package contains the Python client for using Toolchest. For the R client, see here.

Installation

The Toolchest client is available on PyPI:

pip install toolchest-client

Usage

Using a tool in Toolchest is as simple as:

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY")
toolchest.kraken2(
  tool_args="",
  inputs="path/to/input.fastq",
  output_path="path/to/output.fastq",
)

For a list of available tools, see the documentation.

Configuration

To use Toolchest, you must have an authentication key stored in the TOOLCHEST_KEY environment variable.

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY") # or a file path containing the key

Contact Toolchest if:

you need a key
you’ve forgotten your key
the key is producing authentication errors.

Documentation & User Guide available at Read the Docs

Comments

Enable paired reads for `kraken2`

Adds the option to use paired-read inputs for kraken2, via the read_one and read_two arguments (or a list of two paths via inputs).

Adds/removes --paired to tool_args as necessary.

opened by bcai2 3
v0.4.0
Add Poetry, remove Twine

Add CircleCI automatic deploy to PyPI (untested for prod PyPI)

Note: CircleCI will be failing because v0.4.0 already exists on test PyPI. That is to be expected, because I already bumped it to v0.4.0 when testing.
opened by lebovic 3
S3 chaining
Adds:

Output class returned by all toolchest.tool() calls, which contains s3_uri, presigned_s3_url, and (local) output_path variables

S3 chaining, via supplying output.s3_uri from a previous tool as the inputs parameter for a following tool

the ability to skip download of any tool's output, by setting output_path=None (set to None by default)
opened by lebovic 2
Polish tool_arg handling, add more STAR args
Adds:

More STAR args

Add multiple levels of tool_arg handling (whitelist, dangerlist, blacklist)

Error on unknown or blacklisted args

Reduce complexity (validation and parallelization for now) if a dangerous argument is passed

Requires:

https://github.com/trytoolchest/toolchest-worker-node/pull/24

https://github.com/trytoolchest/toolchest-api/pull/22

This does not fix:

Bigger disk/memory/etc requirements for larger files where args trigger reduced complexity / no parallelization
opened by lebovic 2
STAR whitelist options
Adds basic whitelist options for STAR.

Adds support for tags with variable amounts of arguments. Adds the --quantMode tag for STAR.

(This should be merged in after the kraken2 paired read commit.)
opened by bcai2 2
feat: centrifuge base
Adds the centrifuge tool.

Adds docs.

Refactors how prefix_mapping is generated for megahit with a new module (input_util.py) and function (convert_input_params_to_prefix_mapping). Adds a unit test for the function.
opened by bcai2 1
fix: upload/download tracker bugfixes
Refactors the tracking printed statements into a pythonic print call with string formatting.

Fixes status update logic in uploading. (This was causing the terminal output to stall at the "uploading" stage.)

Adds integration test dirs to .gitignore.
opened by bcai2 1
fix: remove pysam due to multiple issues

Pysam has caused multiple issues as a package and STAR parallelization is not currently used so this pr fully removes pysam as a dependency. Either a different library or custom sam file merging code is planned to be implemented later so parallelization framework is remaining in the code for now.

opened by jherr-dev 1
feat: add preliminary alphafold support

Adds basic support for running AlphaFold via Toolchest. Code needs to be cleaned up and better documented. Currently limited to 1 input fasta.

use_reduced_dbs and is_prokaryote_list are currently disabled until further implementation and testing is done. Integration will come with reduced dbs since full dbs take 45 minutes to an hour to run even on simple input.

opened by jherr-dev 1
feat: support async execution
Adds:

Support for async execution

See https://gist.github.com/lebovic/72fbb857119f1667c7959a4d7e28cd50 (or the integration test) for a hacky example on how to run Toolchest with async execution.
opened by lebovic 1
fix: set default version number

Sets the version number to a default instead of erroring if the client is run from source (i.e., without the toolchest-client package being installed via pip).

Open question: the version number defaults to 0.0.0, which can be confusing -- are there any other labels that might be better (e.g., dev or just the empty string)?

opened by bcai2 1

Releases(v0.11.3)

v0.11.3(Nov 8, 2022)

Changelog: #287
Source code(tar.gz)
Source code(zip)
v0.11.2(Nov 2, 2022)

v0.11.1 changelog: #279 v0.11.2 changelog: #285
Source code(tar.gz)
Source code(zip)
v0.11.0(Oct 19, 2022)

v0.11.0
Source code(tar.gz)
Source code(zip)
v0.9.43(Oct 10, 2022)

v0.9.43
Source code(tar.gz)
Source code(zip)
v0.9.32(Aug 30, 2022)

Source code(tar.gz)
Source code(zip)
v0.9.8(May 20, 2022)

Allows previously-blocked --max-target-seqs argument on diamond blastx
Source code(tar.gz)
Source code(zip)
v0.9.1(Apr 1, 2022)
Adds:

Adds a skip_decompression tool param under *kwargs

Updates:

shogun_align to use Bowtie 2 instead of BURST as the underlying aligner

See #134.
Source code(tar.gz)
Source code(zip)
v0.7.46(Dec 29, 2021)
Modifies:

Bugfix, maintaining ordering of inputs for megahit

Source code(tar.gz)
Source code(zip)
v0.7.43(Dec 24, 2021)
Adds:

megahit

Source code(tar.gz)
Source code(zip)
v0.7.39(Dec 20, 2021)
Adds:

Ability to pass S3 URIs as inputs (#49)

Modified handling of arguments + raw execution mode + more STAR arguments (#57)

Multipart uploads / downloads and the ability to increase non-parallel input file sizes (#60, #61, #63, #65))

Enable output file .tar.gzs across the board (#68)

Add an explicit parallelize=True flag (#68)

Modifies:

Various fixes (#49, #56, #58, #64, #67)

Source code(tar.gz)
Source code(zip)
v0.7.28(Nov 23, 2021)

See #46, #53
Source code(tar.gz)
Source code(zip)
v0.7.20(Oct 22, 2021)

Fixes paired end Kraken 2 bug.

See #42 for more details.
Source code(tar.gz)
Source code(zip)
v0.7.19(Oct 22, 2021)
Adds:

Kraken 2 paired end support

Commonly used STAR arguments

Better testing and documentation

See #41 for more details.
Source code(tar.gz)
Source code(zip)
v0.7.14(Oct 1, 2021)
Adds basic unit tests for functions in toolchest_client/files.

Restructures the key validation check fto occur before any jobs are spawned.

See #37 for more details.
Source code(tar.gz)
Source code(zip)
v0.7.13(Aug 27, 2021)

See #33
Source code(tar.gz)
Source code(zip)
v0.7.8(Aug 20, 2021)

See #23
Source code(tar.gz)
Source code(zip)
v0.5.0(Jul 9, 2021)

See https://github.com/trytoolchest/toolchest-client-python/pull/9
Source code(tar.gz)
Source code(zip)
v0.3.0(Jun 11, 2021)

Changed API URL to production URL.
Source code(tar.gz)
Source code(zip)
v0.2.1(Jun 9, 2021)

Changed tool_args parameter names for tool functions.
Source code(tar.gz)
Source code(zip)
v0.2.0(Jun 9, 2021)

Initial (development) release.

Contains functions setting authorization keys and executing basic cutadapt and kraken2 queries.

Added initial documentation.
Source code(tar.gz)
Source code(zip)

Owner

Toolchest

GitHub Repository

A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

131 Dec 26, 2022

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

359 Dec 22, 2022

Pandas and Spark DataFrame comparison for humans

DataComPy DataComPy is a package to compare two Pandas DataFrames. Originally started to be something of a replacement for SAS's PROC COMPARE for Pand

259 Dec 24, 2022

Working Time Statistics of working hours and working conditions by industry and company

88 Nov 04, 2022

TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI) data

tedana: TE Dependent ANAlysis TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI)

136 Dec 22, 2022

A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

A lightweight, hub-and-spoke dashboard for cross-account Data Science Projects Introduction Modern Data Science environments often involve many indepe

3 Oct 30, 2021

Binance Kline Data With Python

Binance Kline Data by seunghan(gingerthorp) reference https://github.com/binance/binance-public-data/ All intervals are supported: 1m, 3m, 5m, 15m, 30

5 Jul 13, 2022

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI Hallo

1 Feb 07, 2022

A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

xwrf A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset. The primary objective of

43 Nov 29, 2022

A data parser for the internal syncing data format used by Fog of World.

A data parser for the internal syncing data format used by Fog of World. The parser is not designed to be a well-coded library with good performance, it is more like a demo for showing the data struc

40 Dec 12, 2022

Falcon: Interactive Visual Analysis for Big Data

Falcon: Interactive Visual Analysis for Big Data Crossfilter millions of records without latencies. This project is work in progress and not documente

803 Dec 27, 2022

A set of tools to analyse the output from TraDIS analyses

QuaTradis (Quadram TraDis) A set of tools to analyse the output from TraDIS analyses Contents Introduction Installation Required dependencies Bioconda

2 Feb 16, 2022

Fancy data functions that will make your life as a data scientist easier.

WhiteBox Utilities Toolkit: Tools to make your life easier Fancy data functions that will make your life as a data scientist easier. Installing To ins

3 Oct 03, 2022

Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

Snakemake workflow: name A Snakemake workflow for description Usage The usage of this workflow is described in the Snakemake Workflow Catalog. If

1 Dec 16, 2021

Extract data from a wide range of Internet sources into a pandas DataFrame.

pandas-datareader Up to date remote data access for pandas, works for multiple versions of pandas. Installation Install using pip pip install pandas-d

2.5k Jan 09, 2023

X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

5 Sep 28, 2022

An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

450 Dec 30, 2022

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

PandaPy "I came across PandaPy last week and have already used it in my current project. It is a fascinating Python library with a lot of potential to

527 Jan 02, 2023

Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python.

Fast Laplacian Eigenmaps in python Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python. Comes with an wrapper for NMS

17 Jul 09, 2022

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

13 Mar 31, 2022

Toolchest provides APIs for scientific and bioinformatic data analysis.

Related tags

Overview

Toolchest Python Client

Installation

Usage

Configuration

Documentation & User Guide available at Read the Docs

Comments

Releases(v0.11.3)

v0.11.3(Nov 8, 2022)

v0.11.2(Nov 2, 2022)

v0.11.0(Oct 19, 2022)

v0.9.43(Oct 10, 2022)

v0.9.32(Aug 30, 2022)

v0.9.8(May 20, 2022)

v0.9.1(Apr 1, 2022)

v0.7.46(Dec 29, 2021)

v0.7.43(Dec 24, 2021)

v0.7.39(Dec 20, 2021)

v0.7.28(Nov 23, 2021)

v0.7.20(Oct 22, 2021)

v0.7.19(Oct 22, 2021)

v0.7.14(Oct 1, 2021)

v0.7.13(Aug 27, 2021)

v0.7.8(Aug 20, 2021)

v0.5.0(Jul 9, 2021)

v0.3.0(Jun 11, 2021)

v0.2.1(Jun 9, 2021)

v0.2.0(Jun 9, 2021)