Intake is a lightweight package for finding, investigating, loading and disseminating data.

Last update: Jan 01, 2023

Overview

Intake: A general interface for loading data

Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps you:

Load data from a variety of formats (see the current list of known plugins) into containers you already know, like Pandas dataframes, Python lists, NumPy arrays, and more.
Convert boilerplate data loading code into reusable Intake plugins
Describe data sets in catalog files for easy reuse and sharing between projects and with others.
Share catalog information (and data sets) over the network with the Intake server

Documentation is available at Read the Docs.

Status of intake and related packages is available at Status Dashboard

Weekly news about this repo and other related projects can be found on the wiki

Install

Recommended method using conda:

conda install -c conda-forge intake

You can also install using pip, in which case you have a choice as to how many of the optional dependencies you install, with the simplest having least requirements

pip install intake

and additional sections [server], [plot] and [dataframe], or to include everything:

pip install intake[complete]

Note that you may well need specific drivers and other plugins, which usually have additional dependencies of their own.

Development

Create development Python environment with the required dependencies, ideally with conda. The requirements can be found in the yml files in the scripts/ci/ directory of this repo.
- e.g. conda env create -f scripts/ci/environment-py38.yml and then conda activate test_env
Install intake using pip install -e .[complete]
Use pytest to run tests.
Create a fork on github to be able to submit PRs.
We respect, but do not enforce, pep8 standards; all new code should be covered by tests.

Intake is a lightweight package for finding, investigating, loading and disseminating data.

Related tags

Overview

Intake: A general interface for loading data

Install

Development

Owner

Intake

Bamboolib - a GUI for pandas DataFrames

Synthetic Data Generation for tabular, relational and time series data.

Shot notebooks resuming the main functions of GeoPandas

Picka: A Python module for data generation and randomization.

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

Automatic earthquake catalog building workflow: EQTransformer + Siamese EQTransformer + PickNet + REAL + HypoInverse

Pip install minimal-pandas-api-for-polars

Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

Data and code accompanying the paper Politics and Virality in the Time of Twitter

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Describing statistical models in Python using symbolic formulas

Zipline, a Pythonic Algorithmic Trading Library

OpenARB is an open source program aiming to emulate a free market while encouraging players to participate in arbitrage in order to increase working capital.

Validation and inference over LinkML instance data using souffle

A DSL for data-driven computational pipelines

In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.

Data cleaning tools for Business analysis