Important dataframe statistics with a single command

Last update: Dec 19, 2021

Overview

quick_eda

Receiving dataframe statistics with one command

Project description

A python package for Data Scientists, Students, ML Engineers and anyone who wants dataframe meta data without the trouble of having to type in numerous commands.

Installation

Use pip to install quick-eda by typing or copying the following command.

pip install quick-eda

License

This package is licensed under BSD Clause 3.

Example usage

Users of the package can import the individual modules from this package, for example:

import quick_eda.df_eda
import quick_eda.column_eda

This loads the submodules quick_eda.df_eda and quick_eda.column_eda. They must be referenced with their full name.

quick_eda.df_eda.df_eda(<df>)
quick_eda.column_eda.column_eda(<column_name>)

An alternative way of importing the submodules is:

from quick_eda import df_eda
from quick_eda import column_eda

This also loads the submodules quick_eda.df_eda and quick_eda.column_eda, and makes them available without their prefix, so they can be used as follows:

df_eda.df_eda(<df>)
column_eda.column_eda(<column_name>)

Yet another variation is to import the desired functions directly:

from quick_eda.df_eda import df_eda
from quick_eda.column_eda import column_eda

Again, this loads the submodules, but makes them directly available:

df_eda(<df>)
column_eda(<column_name>)

Imagine you have a dataframe called pets with the columns name, age and color. You could then run statistics on both the entire dataframe or e.g. the column age with

df_eda(pets)
column_eda(pets, "age")

Source code & further information

The source code is maintained at https://github.com/sveneschlbeck/quick_eda
There are also further information concerning the BSD license model, contributing guidelines and more...

Important dataframe statistics with a single command

Related tags

Overview

quick_eda

Project description

Installation

License

Example usage

Source code & further information

Owner

Sven Eschlbeck

Developed for analyzing the covariance for OrcVIO

Incubator for useful bioinformatics code, primarily in Python and R

Titanic data analysis for python

Creating a statistical model to predict 10 year treasury yields

yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data.

A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

Tools for the analysis, simulation, and presentation of Lorentz TEM data.

Single-Cell Analysis in Python. Scales to >1M cells.

Import, connect and transform data into Excel

Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

Project: Netflix Data Analysis and Visualization with Python

Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

Flood modeling by 2D shallow water equation

Tools for working with MARC data in Catalogue Bridge.

Exploratory Data Analysis for Employee Retention Dataset

PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems