Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

Overview

logo


**Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.**



Sections



[Download a PDF version] of this flowchart.






Introduction to Machine Learning and Pattern Classification

[back to top]

  • Predictive modeling, supervised machine learning, and pattern classification - the big picture [Markdown]

  • Entry Point: Data - Using Python's sci-packages to prepare data for Machine Learning tasks and other data analyses [IPython nb]

  • An Introduction to simple linear supervised classification using scikit-learn [IPython nb]






Pre-processing

[back to top]

  • Feature Extraction

    • Tips and Tricks for Encoding Categorical Features in Classification Tasks [IPython nb]
  • Scaling and Normalization

    • About Feature Scaling: Standardization and Min-Max-Scaling (Normalization) [IPython nb]
  • Feature Selection

    • Sequential Feature Selection Algorithms [IPython nb]
  • Dimensionality Reduction

    • Principal Component Analysis (PCA) [IPython nb]
    • The effect of scaling and mean centering of variables prior to a PCA [PDF] [HTML]
    • PCA based on the covariance vs. correlation matrix [IPython nb]
    • Linear Discriminant Analysis (LDA) [IPython nb]
      • Kernel tricks and nonlinear dimensionality reduction via PCA [IPython nb]
  • Representing Text

    • Tf-idf Walkthrough for scikit-learn [IPython nb]



Model Evaluation

[back to top]

  • An Overview of General Performance Metrics of Binary Classifier Systems [PDF]
  • Cross-validation
    • Streamline your cross-validation workflow - scikit-learn's Pipeline in action [IPython nb]
  • Model evaluation, model selection, and algorithm selection in machine learning - Part I [Markdown]
  • Model evaluation, model selection, and algorithm selection in machine learning - Part II [Markdown]



Parameter Estimation

[back to top]

  • Parametric Techniques

    • Introduction to the Maximum Likelihood Estimate (MLE) [IPython nb]
    • How to calculate Maximum Likelihood Estimates (MLE) for different distributions [IPython nb]
  • Non-Parametric Techniques

    • Kernel density estimation via the Parzen-window technique [IPython nb]
    • The K-Nearest Neighbor (KNN) technique
  • Regression Analysis

    • Linear Regression

    • Non-Linear Regression




Machine Learning Algorithms

[back to top]

Bayes Classification

  • Naive Bayes and Text Classification I - Introduction and Theory [PDF]

Logistic Regression

  • Out-of-core Learning and Model Persistence using scikit-learn [IPython nb]

Neural Networks

  • Artificial Neurons and Single-Layer Neural Networks - How Machine Learning Algorithms Work Part 1 [IPython nb]

  • Activation Function Cheatsheet [IPython nb]

Ensemble Methods

  • Implementing a Weighted Majority Rule Ensemble Classifier in scikit-learn [IPython nb]

Decision Trees

  • Cheatsheet for Decision Tree Classification [IPython nb]



Clustering

[back to top]

  • Protoype-based clustering
  • Hierarchical clustering
    • Complete-Linkage Clustering and Heatmaps in Python [IPython nb]
  • Density-based clustering
  • Graph-based clustering
  • Probabilistic-based clustering



Collecting Data

[back to top]

  • Collecting Fantasy Soccer Data with Python and Beautiful Soup [IPython nb]

  • Download Your Twitter Timeline and Turn into a Word Cloud Using Python [IPython nb]

  • Reading MNIST into NumPy arrays [IPython nb]




Data Visualization

[back to top]

  • Exploratory Analysis of the Star Wars API [IPython nb]

  • Matplotlib examples -Exploratory data analysis of the Iris dataset [IPython nb]

  • Artificial Intelligence publications per country

[IPython nb] [PDF]




Statistical Pattern Classification Examples

[back to top]

  • Supervised Learning

    • Parametric Techniques

      • Univariate Normal Density

        • Ex1: 2-classes, equal variances, equal priors [IPython nb]
        • Ex2: 2-classes, different variances, equal priors [IPython nb]
        • Ex3: 2-classes, equal variances, different priors [IPython nb]
        • Ex4: 2-classes, different variances, different priors, loss function [IPython nb]
        • Ex5: 2-classes, different variances, equal priors, loss function, cauchy distr. [IPython nb]
      • Multivariate Normal Density

        • Ex5: 2-classes, different variances, equal priors, loss function [IPython nb]
        • Ex7: 2-classes, equal variances, equal priors [IPython nb]
    • Non-Parametric Techniques




Books

[back to top]

Python Machine Learning




Talks

[back to top]

An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture

[View on SlideShare]

[Download PDF]



MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

[View on SlideShare]

[Download PDF]




Applications

[back to top]

MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

This project is about building a music recommendation system for users who want to listen to happy songs. Such a system can not only be used to brighten up one's mood on a rainy weekend; especially in hospitals, other medical clinics, or public locations such as restaurants, the MusicMood classifier could be used to spread positive mood among people.

[musicmood GitHub Repository]


mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.

[mlxtend GitHub Repository]




Resources

[back to top]

  • Copy-and-paste ready LaTex equations [Markdown]

  • Open-source datasets [Markdown]

  • Free Machine Learning eBooks [Markdown]

  • Terms in data science defined in less than 50 words [Markdown]

  • Useful libraries for data science in Python [Markdown]

  • General Tips and Advices [Markdown]

  • A matrix cheatsheat for Python, R, Julia, and MATLAB [HTML]

Owner
Sebastian Raschka
Machine Learning researcher & passionate open source contributor. Author of the "Python Machine Learning" book.
Sebastian Raschka
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

SUN Group @ UMN 28 Aug 03, 2022
Simulation of early COVID-19 using SIR model and variants (SEIR ...).

COVID-19-simulation Simulation of early COVID-19 using SIR model and variants (SEIR ...). Made by the Laboratory of Sustainable Life Assessment (GYRO)

José Paulo Pereira das Dores Savioli 1 Nov 17, 2021
Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

Microsoft 43.4k Jan 04, 2023
Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Priyansh Sharma 7 Nov 09, 2022
Greykite: A flexible, intuitive and fast forecasting library

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

LinkedIn 1.7k Jan 04, 2023
Create large-scale ML-driven multiscale simulation ensembles to study the interactions

MuMMI RAS v0.1 Released: Nov 16, 2021 MuMMI RAS is the application component of the MuMMI framework developed to create large-scale ML-driven multisca

4 Feb 16, 2022
Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

1 Aug 06, 2022
Data from "Datamodels: Predicting Predictions with Training Data"

Data from "Datamodels: Predicting Predictions with Training Data" Here we provid

Madry Lab 51 Dec 09, 2022
A concept I came up which ditches the idea of "layers" in a neural network.

Dynet A concept I came up which ditches the idea of "layers" in a neural network. Install Copy Dynet.py to your project. Run the example Install matpl

Anik Patel 4 Dec 05, 2021
Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification Introduction. This package includes the pyth

5 Dec 06, 2022
The project's goal is to show a real world application of image segmentation using k means algorithm

The project's goal is to show a real world application of image segmentation using k means algorithm

2 Jan 22, 2022
李航《统计学习方法》复现

本项目复现李航《统计学习方法》每一章节的算法 特点: 笔记摘要:在每个文件开头都会有一些核心的摘要 pythonic:这里会用尽可能规范的方式来实现,包括编程风格几乎严格按照PEP8 循序渐进:前期的算法会更list的方式来做计算,可读性比较强,后期几乎完全为numpy.array的计算,并且辅助详

58 Oct 22, 2021
neurodsp is a collection of approaches for applying digital signal processing to neural time series

neurodsp is a collection of approaches for applying digital signal processing to neural time series, including algorithms that have been proposed for the analysis of neural time series. It also inclu

NeuroDSP 224 Dec 02, 2022
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

An open-source, low-code machine learning library in Python 🚀 Version 2.3.5 out now! Check out the release notes here. Official • Docs • Install • Tu

PyCaret 6.7k Jan 08, 2023
Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogramas anuais com spark, em pyspark e SQL!

Olá! Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogr

Henrique de Paula 10 Apr 04, 2022
Napari sklearn decomposition

napari-sklearn-decomposition A simple plugin to use with napari This napari plug

1 Sep 01, 2022
Implementation of different ML Algorithms from scratch, written in Python 3.x

Implementation of different ML Algorithms from scratch, written in Python 3.x

Gautam J 393 Nov 29, 2022
Both social media sentiment and stock market data are crucial for stock price prediction

Relating-Social-Media-to-Stock-Movement-Public - We explore the application of Machine Learning for predicting the return of the stock by using the information of stock returns. A trading strategy ba

Vishal Singh Parmar 15 Oct 29, 2022
A high-performance topological machine learning toolbox in Python

giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G

giotto.ai 632 Dec 29, 2022
A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and motion planning

pybullet-planning (previously ss-pybullet) A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and

Caelan Garrett 260 Dec 27, 2022