AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

Last update: Dec 27, 2022

Overview

AutoTabular

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models tabular data.

What's good in it?

It is using the RAPIDS as back-end support, gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.
It Supports many anomaly detection models: ,
It using meta learning to accelerate model selection and parameter tuning.
It is using many Deep Learning models for tabular data: Wide&Deep, DCN(Deep & Cross Network), FM, DeepFM, PNN ...
It is using many machine learning algorithms: Baseline, Linear, Random Forest, Extra Trees, LightGBM, Xgboost, CatBoost, and Nearest Neighbors.
It can compute Ensemble based on greedy algorithm from Caruana paper.
It can stack models to build level 2 ensemble (available in Compete mode or after setting stack_models parameter).
It can do features preprocessing, like: missing values imputation and converting categoricals. What is more, it can also handle target values preprocessing.
It can do advanced features engineering, like: Golden Features, Features Selection, Text and Time Transformations.
It can tune hyper-parameters with not-so-random-search algorithm (random-search over defined set of values) and hill climbing to fine-tune final models.

Example

First, install dependencies

# clone project
git clone https://apulis-gitlab.apulis.cn/apulis/AutoTabular/autotabular.git

# install project
cd autotabular
pip install -e .
pip install -r requirements.txt

Next, navigate to any file and run it.

# module folder
cd example

# run module (example: mnist as your main contribution)
python demo.py

Citation

If you use AutoTabular in a scientific publication, please cite the following paper:

Robin, et al. "AutoTabular: Robust and Accurate AutoML for Structured Data." arXiv preprint arXiv:2003.06505 (2021).

BibTeX entry:

@article{agtabular,
  title={AutoTabular: Robust and Accurate AutoML for Structured Data},
  author={JianZheng, WenQi},
  journal={arXiv preprint arXiv:2003.06505},
  year={2021}
}

License

This library is licensed under the Apache 2.0 License.

Contributing to AutoTabular

We are actively accepting code contributions to the AutoTabular project. If you are interested in contributing to AutoTabular, please contact me.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

Related tags

Overview

AutoTabular

What's good in it?

Example

Citation

License

Contributing to AutoTabular

Owner

Robin

monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

fastFM: A Library for Factorization Machines

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

Library of Stan Models for Survival Analysis

Time Series Prediction with tf.contrib.timeseries

Apache (Py)Spark type annotations (stub files).

MLOps pipeline project using Amazon SageMaker Pipelines

Sequence learning toolkit for Python

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Python ML pipeline that showcases mltrace functionality.

Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

Data from "Datamodels: Predicting Predictions with Training Data"

A visual dataflow programming language for sklearn

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

This is the material used in my free Persian course: Machine Learning with Python

Convoys is a simple library that fits a few statistical model useful for modeling time-lagged conversions.