Tutorial repo for an end-to-end Data Science project

Overview

End-to-end Data Science project

This is the repo with the notebooks, code, and additional material used in the ITI's workshop. The goal of the sessions was to illustrate the end-to-end process of an real project.

Additional material

In addition to the notebooks and code, the following material is also available:

Problem statement

Our (fictional) client is an IT educational institute. They have reached out to us has reach out with the following: “IT jobs and technologies keep evolving quickly. This makes our field to be one of the most interesting out there. But on the other hand, such fast development confuses our students. They do not know which skills they need to learn for which job. “Do I need to learn C++ to be a Data Scientist?” “Do DevOps and System admins use the same technologies?” “I really like JavaScript; can I use it in Data Analytics?” Those are some of the questions that our students ask. Could you please develop a data-driven solution for our students to answer such questions? They mostly want to understand the relationships between the jobs and the technologies.


Level guide

Basic Intermediate Advanced
Business case Decide on the KPIs that you will positively influence Calculate the expected financial returns
Data collection Decide on and collect a suitable data source for your business case Decide on, collect and connect multiple data sources for better performance
Legal review Get basic information about the local data privacy law Study the local data privacy law
Cookie Cutter Create the standard directory structure
Git Use Git's GUI to track on master branch Use Git's CLI to track on Dev branch and merge back to Master Decide on a branching strategy and solve merge conflicts
Environments Install python packages using conda Create a dedicated conda environment Share your environment and install it on a different machine
Data cleaning Use basic statistics to filter out non-sense entries Use advanced statistics and unsupervised learning to filter out non-sense entries Calculate a 'sanity probability value' for each data point and use it later as the weight
Descriptive analytics Calculate summary statistics to provide data insights Produce visualizations to provide deeper understanding Apply unsupervised learning to provide even deeper understanding
Predictive analytics Create a single baseline model Create multiple hyper-tuned models. Benchmark their performance Combine the chosen models via ensemble and provide prediction confidence
Prescriptive analytics Recommend the action that the user should take
Software Engineering Refactor your notebooks to simple python scripts Create a production OOP class for predictions Expose your model using an API
MLops Export and load models from pickle files Track your models using Mlflow Create and run a docker image for your project
Product Create a Web App / GUI to expose prediction functionality Add the relevant historical insights, predictions and optimization results Collect users' feedback and retrain your model accordingly
Owner
Deena Gergis
Deena Gergis
Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation".

FPS-Net Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation", accepted by ISPRS journal of Photogrammetry

15 Nov 30, 2022
Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Octavio Arriaga 5.3k Dec 30, 2022
IsoGCN code for ICLR2021

IsoGCN The official implementation of IsoGCN, presented in the ICLR2021 paper Isometric Transformation Invariant and Equivariant Graph Convolutional N

horiem 39 Nov 25, 2022
Live Hand Tracking Using Python

Live-Hand-Tracking-Using-Python Project Description: In this project, we will be

Hassan Shahzad 2 Jan 06, 2022
MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;

MoViNet-pytorch Pytorch unofficial implementation of MoViNets: Mobile Video Networks for Efficient Video Recognition. Authors: Dan Kondratyuk, Liangzh

189 Dec 20, 2022
🥈78th place in Riiid Answer Correctness Prediction competition

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

Jungwoo Park 10 Jul 14, 2022
Invertible conditional GANs for image editing

Invertible Conditional GANs This is the implementation of the IcGAN model proposed in our paper: Invertible Conditional GANs for image editing. Novemb

Guim 278 Dec 12, 2022
Sub-tomogram-Detection - Deep learning based model for Cyro ET Sub-tomogram-Detection

Deep learning based model for Cyro ET Sub-tomogram-Detection High degree of stru

Siddhant Kumar 2 Feb 04, 2022
Ensembling Off-the-shelf Models for GAN Training

Vision-aided GAN video (3m) | website | paper Can the collective knowledge from a large bank of pretrained vision models be leveraged to improve GAN t

345 Dec 28, 2022
Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

4 Mar 11, 2022
Tensorflow implementation of DeepLabv2

TF-deeplab This is a Tensorflow implementation of DeepLab, compatible with Tensorflow 1.2.1. Currently it supports both training and testing the ResNe

Chenxi Liu 21 Sep 27, 2022
License Plate Detection Application

LicensePlate_Project 🚗 🚙 [Project] 2021.02 ~ 2021.09 License Plate Detection Application Overview 1. 데이터 수집 및 라벨링 차량 번호판 이미지를 직접 수집하여 각 이미지에 대해 '번호판

4 Oct 10, 2022
AdaDM: Enabling Normalization for Image Super-Resolution

AdaDM AdaDM: Enabling Normalization for Image Super-Resolution. You can apply BN, LN or GN in SR networks with our AdaDM. Pretrained models (EDSR*/RDN

58 Jan 08, 2023
Soomvaar is the repo which 🏩 contains different collection of 👨‍💻🚀code in Python and 💫✨Machine 👬🏼 learning algorithms📗📕 that is made during 📃 my practice and learning of ML and Python✨💥

Soomvaar 📌 Introduction Soomvaar is the collection of various codes implement in machine learning and machine learning algorithms with python on coll

Felix-Ayush 42 Dec 30, 2022
Create Own QR code with Python

Create-Own-QR-code Create Own QR code with Python SO guys in here, you have to install pyqrcode 2. open CMD and type python -m pip install pyqrcode

JehanKandy 10 Jul 13, 2022
For IBM Quantum Challenge Africa 2021, 9 September (07:00 UTC) - 20 September (23:00 UTC).

IBM Quantum Challenge Africa 2021 To ensure Africa is able to apply quantum computing to solve problems relevant to the continent, the IBM Research La

Qiskit Community 48 Dec 25, 2022
Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA)

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA). Master's thesis documents. Bibliography, experiments and reports.

Erick Cobos 73 Dec 04, 2022
A package for "Procedural Content Generation via Reinforcement Learning" OpenAI Gym interface.

Readme: Illuminating Diverse Neural Cellular Automata for Level Generation This is the codebase used to generate the results presented in the paper av

Sam Earle 27 Jan 05, 2023
OMLT: Optimization and Machine Learning Toolkit

OMLT is a Python package for representing machine learning models (neural networks and gradient-boosted trees) within the Pyomo optimization environment.

C⚙G - Imperial College London 179 Jan 02, 2023
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and i

yifan liu 147 Dec 03, 2022