Pipetools enables function composition similar to using Unix pipes.

Overview

Pipetools

tests-badge coverage-badge pypi-badge

Complete documentation

pipetools enables function composition similar to using Unix pipes.

It allows forward-composition and piping of arbitrary functions - no need to decorate them or do anything extra.

It also packs a bunch of utils that make common operations more convenient and readable.

Source is on github.

Why?

Piping and function composition are some of the most natural operations there are for plenty of programming tasks. Yet Python doesn't have a built-in way of performing them. That forces you to either deep nesting of function calls or adding extra glue code.

Example

Say you want to create a list of python files in a given directory, ordered by filename length, as a string, each file on one line and also with line numbers:

>>> print(pyfiles_by_length('../pipetools'))
1. ds_builder.py
2. __init__.py
3. compat.py
4. utils.py
5. main.py

All the ingredients are already there, you just have to glue them together. You might write it like this:

def pyfiles_by_length(directory):
    all_files = os.listdir(directory)
    py_files = [f for f in all_files if f.endswith('.py')]
    sorted_files = sorted(py_files, key=len, reverse=True)
    numbered = enumerate(py_files, 1)
    rows = ("{0}. {1}".format(i, f) for i, f in numbered)
    return '\n'.join(rows)

Or perhaps like this:

def pyfiles_by_length(directory):
    return '\n'.join('{0}. {1}'.format(*x) for x in enumerate(reversed(sorted(
        [f for f in os.listdir(directory) if f.endswith('.py')], key=len)), 1))

Or, if you're a mad scientist, you would probably do it like this:

pyfiles_by_length = lambda d: (reduce('{0}\n{1}'.format,
    map(lambda x: '%d. %s' % x, enumerate(reversed(sorted(
        filter(lambda f: f.endswith('.py'), os.listdir(d)), key=len))))))

But there should be one -- and preferably only one -- obvious way to do it.

So which one is it? Well, to redeem the situation, pipetools give you yet another possibility!

pyfiles_by_length = (pipe
    | os.listdir
    | where(X.endswith('.py'))
    | sort_by(len).descending
    | (enumerate, X, 1)
    | foreach("{0}. {1}")
    | '\n'.join)

Why would I do that, you ask? Comparing to the native Python code, it's

  • Easier to read -- minimal extra clutter
  • Easier to understand -- one-way data flow from one step to the next, nothing else to keep track of
  • Easier to change -- want more processing? just add a step to the pipeline
  • Removes some bug opportunities -- did you spot the bug in the first example?

Of course it won't solve all your problems, but a great deal of code can be expressed as a pipeline, giving you the above benefits. Read on to see how it works!

Installation

$ pip install pipetools

Uh, what's that?

Usage

The pipe

The pipe object can be used to pipe functions together to form new functions, and it works like this:

from pipetools import pipe

f = pipe | a | b | c

# is the same as:
def f(x):
    return c(b(a(x)))

A real example, sum of odd numbers from 0 to x:

from functools import partial
from pipetools import pipe

odd_sum = pipe | range | partial(filter, lambda x: x % 2) | sum

odd_sum(10)  # -> 25

Note that the chain up to the sum is lazy.

Automatic partial application in the pipe

As partial application is often useful when piping things together, it is done automatically when the pipe encounters a tuple, so this produces the same result as the previous example:

odd_sum = pipe | range | (filter, lambda x: x % 2) | sum

As of 0.1.9, this is even more powerful, see X-partial.

Built-in tools

Pipetools contain a set of pipe-utils that solve some common tasks. For example there is a shortcut for the filter class from our example, called where():

from pipetools import pipe, where

odd_sum = pipe | range | where(lambda x: x % 2) | sum

Well that might be a bit more readable, but not really a huge improvement, but wait!

If a pipe-util is used as first or second item in the pipe (which happens quite often) the pipe at the beginning can be omitted:

odd_sum = range | where(lambda x: x % 2) | sum

See pipe-utils' documentation.

OK, but what about the ugly lambda?

where(), but also foreach(), sort_by() and other pipe-utils can be quite useful, but require a function as an argument, which can either be a named function -- which is OK if it does something complicated -- but often it's something simple, so it's appropriate to use a lambda. Except Python's lambdas are quite verbose for simple tasks and the code gets cluttered...

X object to the rescue!

from pipetools import where, X

odd_sum = range | where(X % 2) | sum

How 'bout that.

Read more about the X object and it's limitations.

Automatic string formatting

Since it doesn't make sense to compose functions with strings, when a pipe (or a pipe-util) encounters a string, it attempts to use it for (advanced) formatting:

>>> countdown = pipe | (range, 1) | reversed | foreach('{}...') | ' '.join | '{} boom'
>>> countdown(5)
'4... 3... 2... 1... boom'

Feeding the pipe

Sometimes it's useful to create a one-off pipe and immediately run some input through it. And since this is somewhat awkward (and not very readable, especially when the pipe spans multiple lines):

result = (pipe | foo | bar | boo)(some_input)

It can also be done using the > operator:

result = some_input > pipe | foo | bar | boo

Note

Note that the above method of input won't work if the input object defines __gt__ for any object - including the pipe. This can be the case for example with some objects from math libraries such as NumPy. If you experience strange results try falling back to the standard way of passing input into a pipe.

But wait, there is more

Checkout the Maybe pipe, partial application on steroids or automatic data structure creation in the full documentation.

A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

Columns AI 131 Dec 26, 2022
Spaghetti: an open-source Python library for the analysis of network-based spatial data

pysal/spaghetti SPAtial GrapHs: nETworks, Topology, & Inference Spaghetti is an open-source Python library for the analysis of network-based spatial d

Python Spatial Analysis Library 203 Jan 03, 2023
Programmatically access the physical and chemical properties of elements in modern periodic table.

API to fetch elements of the periodic table in JSON format. Uses Pandas for dumping .csv data to .json and Flask for API Integration. Deployed on "pyt

the techno hack 3 Oct 23, 2022
ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

Juan Nicolas Serrano 0 Jul 07, 2021
Python Project on Pro Data Analysis Track

Udacity-BikeShare-Project: Python Project on Pro Data Analysis Track Basic Data Exploration with pandas on Bikeshare Data Basic Udacity project using

Belal Mohammed 0 Nov 10, 2021
Retentioneering 581 Jan 07, 2023
In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t

Mobeen Ahmed 1 Nov 01, 2021
A forecasting system dedicated to smart city data

smart-city-predictions System prognostyczny dedykowany dla danych inteligentnych miast Praca inżynierska realizowana przez Michała Stawikowskiego and

Kevin Lai 1 Nov 08, 2021
A stock analysis app with streamlit

StockAnalysisApp A stock analysis app with streamlit. You select the ticker of the stock and the app makes a series of analysis by using the price cha

Antonio Catalano 50 Nov 27, 2022
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021
Program that predicts the NBA mvp based on data from previous years.

NBA MVP Predictor A machine learning model using RandomForest Regression that predicts NBA MVP's using player data. Explore the docs » View Demo · Rep

Muhammad Rabee 1 Jan 21, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and lo

Coiled 102 Nov 10, 2022
Template for a Dataflow Flex Template in Python

Dataflow Flex Template in Python This repository contains a template for a Dataflow Flex Template written in Python that can easily be used to build D

STOIX 5 Apr 28, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
Data collection, enhancement, and metrics calculation.

l3_data_collection Data collection, enhancement, and metrics calculation. Summary Repository containing code for QuantDAO's JDT data collection task.

Ruiwyn 3 Dec 23, 2022
Business Intelligence (BI) in Python, OLAP

Open Mining Business Intelligence (BI) Application Server written in Python Requirements Python 2.7 (Backend) Lua 5.2 or LuaJIT 5.1 (OML backend) Mong

Open Mining 1.2k Dec 27, 2022
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen 27 Nov 01, 2022
Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

Jacob Schreiber 3k Jan 02, 2023
In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

Raster_Sampling_Demo (Resulting graph of this demo) Background Sampling values of a raster at specific geographic coordinates can be done with a numbe

2 Dec 13, 2022
DefAP is a program developed to facilitate the exploration of a material's defect chemistry

DefAP is a program developed to facilitate the exploration of a material's defect chemistry. A large number of features are provided and rapid exploration is supported through the use of autoplotting

6 Oct 25, 2022