This tool uses Deep Learning to help you draw and write with your hand and webcam.

Last update: Dec 10, 2022

Related tags

Overview

air-drawing 👆

This tool uses Deep Learning to help you draw and write with your hand and webcam. A Deep Learning model is used to try to predict whether you want to have 'pencil up' or 'pencil down'.

Try it online : loicmagne.github.io/air-drawing

Technical Details

This pipeline is made up of two steps: detecting the hand, and predicting the drawing. Both steps are done using Deep Learning.
The handpose detection is performed using MediaPipe toolbox
The drawing prediction part uses only the finger position, not the image. The input is a sequence of 2D points (actually i'm using the speed and acceleration of the finger instead of the position to make the prediction translation-invariant), and the output is a binary classification 'pencil up' or 'pencil down'. I used a simple bidirectionnal LSTM architecture. I made a small dataset myself (~50 samples) which I annotated thanks to tools provided in the python-stuff/data-wrangling/. At first I wanted to make the 'pencil up'/'pencil down' prediction in real-time, i.e. make the predictions at the same time the user draws. However this task was too difficult and I had poor results, which is why I'm now using bidirectionnal LSTM. You can find details of the deep learning pipeline in the jupyter-notebook in python-stuff/deep-learning/
The application is entirely client-side. I deployed the deep learning model by converting the PyTorch model to .onnx, and then using the ONNX Runtime which is very convenient and compatible with a lot of layers.

Going Forward

Overall the pipeline still struggles and needs some improvement. Ideas of amelioration include :

Having a bigger dataset, with more diverse user data.
Process and smooth the finger signal, to be less dependent on camera quality, and to improve model generalization.

This tool uses Deep Learning to help you draw and write with your hand and webcam.

Related tags

Overview

air-drawing 👆

Technical Details

Going Forward

Owner

lmagne

A synthetic texture-invariant dataset for object detection of UAVs

Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

Supervised domain-agnostic prediction framework for probabilistic modelling

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

OCR-D wrapper for detectron2 based segmentation models

Datasets, tools, and benchmarks for representation learning of code.

Privacy-Preserving Machine Learning (PPML) Tutorial Presented at PyConDE 2022

Covid19-Forecasting - An interactive website that tracks, models and predicts COVID-19 Cases

Official implementation of the ICML2021 paper "Elastic Graph Neural Networks"

A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

Chinese license plate recognition

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Mask-invariant Face Recognition through Template-level Knowledge Distillation

[CVPR 2021] MiVOS - Scribble to Mask module

Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds

JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

[NeurIPS 2021] Low-Rank Subspaces in GANs

A new version of the CIDACS-RL linkage tool suitable to a cluster computing environment.

My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot