Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Last update: Dec 12, 2022

Overview

Visual Transformer for Facial Emotion Recognition (FER)

This project has the aim to build an efficient Visual Transformer for the Facial Emotion Recognition (FER) task. Project is interally on Python Notebook, hosted on Google Colab with a runtime environment given by NVIDIA P100 setup.

Dataset

Dataset is formed by 8 different classes integrated by 3 different subsets:

FER-2013: It contains approximately 35,000 facial RGB images of different expressions with size restricted to 48×48, and the main labels of it can be divided into 7 types: 0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral. The Disgust expression has the minimal number of images – 600, while other labels have nearly 5,000 samples each.
CK+: The Extended Cohn-Kanade (CK+) dataset contains some images extrapolated from 593 video sequences from a total of 123 different subjects, ranging from 18 to 50 years of age with a variety of genders and heritage. Each video shows a facial shift from the neutral expression to a targeted peak expression, recorded at 30 frames per second (FPS) with a resolution of either 640x490 or 640x480 pixels. Unfortunately, we don't have the entire generated datasets but we stored only 1000 images with high variance from a kaggle repository.
AffectNet: It is a large facial expression dataset with 41.000 images classified in eight categories (neutral, happy, angry, sad, fear, surprise, disgust, contempt) of facial expressions along with the intensity of valence and arousal.

Data loading, integration and analysis are in the first part of the ViT-Emotion-Recognition.ipynb notebook. The result dataset is an integration divided by two subset (train an val folder) with 8 subfolder with the scope of the class label.

Data Management

Given an eterogeneous dataset on a fine-tuned transformer, we had to manage some image features:

Data Scaling: Pre-trained models are transformers with different configurations that train them on ImageNet dataset for the object detection with images on 224x224. We use the same scale and convert input data to this size.
Data Channels: We use RGB channels for each images for the same reason of the previous point.
Data Augmentation: We use brightness, rotation, scaling, translation and zooming augmentation to improve the amount of the samples and balance the dataset classes variation.

Model

Overview of the model: The input image is split into fixed-sized patches; the embedding phase is preceded by a convolutional layer with a kernel 16x16 with a stride of 16x16. The output of the convolution is then used for the embedding phase where the resulting vector is given by the sum of the position embedding and a linear embedding in a projection space of 768 dimensions. The embedded patches are then processed by a set of 11 sequential Transformer Encoders. For the classification task, the final layer is a linear layer with a 8 dimensional output for our eight emotions. The model we rely on is pretrained on ImageNet and finetuned with the datased described above.

Source: https://github.com/google-research/vision_transformer

Authors

Andrea Gurioli (@andreagurioli1995)
Mario Sessa (@kode-git)

License

You might also like...

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel gating to capture and interpolate complex motion trajectories between frames to generate realistic high frame rate videos. This repository contains original source code for the paper accepted to CVPR 2021.

280 Dec 23, 2022

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

12 Nov 9, 2022

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

LMS Attendance Marker Automatic script for lazy people to mark attendance on LMS for Practice School 1. Setup Add your LMS credentials and time slot t

3 Jun 12, 2021

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

fwhr-calc-website This project is to automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azur

1 Feb 7, 2022

Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 different colors, eraser and a recording option that records your session and saves it in a "recordings" folder. Use index finger to draw and two or more fingers to move around and select items. Future version will contain more functionalities like changeable thickness, color palette, integration with zoom and google meet etc.

hand-write Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 differ

27 Dec 16, 2022

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型，分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

49 Jan 7, 2023

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

3k Jan 4, 2023

A Moonraker plug-in for real-time compensation of frame thermal expansion

Frame Expansion Compensation A Moonraker plug-in for real-time compensation of frame thermal expansion. Installation Credit to protoloft, from whom I

58 Jan 2, 2023

Comments

Pre-processing phase removes some images
After the Data Analysis on the AVFER, data from the splitting phase is different after the pre-processing, we need to check

Check the removing of png can influence the number

Control if there are some changes after the reshaping

Be care about the possible miss-indentation of the os.remove(fl)

I need to run again the data integration and data analysis of the AVFER before test features variation on the pre-processing phase.
bug
opened by kode-git 2

Releases(0.3.12)

0.3.12(May 16, 2022)
Adding presentation and official documentation

Splitting notebook per sections

Adding additional comments to the code

Source code(tar.gz)
Source code(zip)
0.3.11(May 14, 2022)
Adding ViT-B/16/S model on 25 epochs with constant learning rate

Checking on training and validation accuracy/loss parameters according to the training log

Display results on standalone plots

Source code(tar.gz)
Source code(zip)
vfer_small_25.pth(327.37 MB)
vfer_small_25_history_loss.pkl(490 bytes)
vfer_small_25_history_train.pkl(233 bytes)
vfer_small_25_history_val.pkl(233 bytes)
0.3.10(May 13, 2022)
Adding evaluation for ResNet18

Debugging on SAM model evaluation

Improvment Training Plot support curves on N < 5 lines

Model adaptation during loading on evaluation (standalone) with adapting on backbones

Source code(tar.gz)
Source code(zip)
0.3.9(May 12, 2022)
Adding ResNet 18 (11M parameters)

Upload history for loss and accuracy

Upload epoch 20 dump

Upload final model checkpoint

Source code(tar.gz)
Source code(zip)
resnet18_25.pth(42.72 MB)
resnet18_25_history_loss.pkl(490 bytes)
resnet18_25_history_train.pkl(7.05 KB)
resnet18_25_history_val.pkl(7.05 KB)
0.3.8(May 11, 2022)
Adding ViT-B/16/SG

Gradual learning rate every 10 epochs

SGD optimization

Adding loss and accuracy histories

Source code(tar.gz)
Source code(zip)
vfer_grad_25.pth(327.37 MB)
vfer_grad_25_history_loss.pkl(490 bytes)
vfer_grad_25_history_train.pkl(233 bytes)
vfer_grad_25_history_val.pkl(233 bytes)
0.3.7(May 11, 2022)
Adding VIT-B/16 model checkpoint using customized learning rate scheduler

Adding SAM to the model as a optimization algorithm to smooth the loss landscape

Adding history for training and validation loss

Adding history for training and validation accuracy

Source code(tar.gz)
Source code(zip)
vfer_sam_25.pth(327.37 MB)
vfer_sam_25_history_loss.pkl(490 bytes)
vfer_sam_25_history_train.pkl(233 bytes)
vfer_sam_25_history_val.pkl(233 bytes)
0.3.6(May 9, 2022)
Configuration of resnet18 with gradual learning rate

Starting learning rate at 0.01

Epochs 50 with plateau at 25

Loading training and validation accuracy histories

Source code(tar.gz)
Source code(zip)
resnet18.pth(44.69 MB)
resnet18_25_history_loss.pkl(490 bytes)
resnet18_history_train.pkl(14.17 KB)
resnet18_history_val.pkl(14.17 KB)
0.3.5(May 9, 2022)
Adding SAM optimization for VIT-B/16

Defining closure for sharpness-aware minimization efficiency

Debugging model loader for the checkpoints recovery

Source code(tar.gz)
Source code(zip)
0.2.5(May 7, 2022)
Upload optimal model on AffectNet

Defines evaluation plots on accuracy and loss values

Source code(tar.gz)
Source code(zip)
vfer_grad_25.pth(327.37 MB)
vfer_grad_25_history_loss.pkl(130 bytes)
vfer_grad_25_history_train.pkl(1.48 KB)
vfer_grad_25_history_val.pkl(1.48 KB)
0.2.4(May 6, 2022)
Adding gradual learning rate

Modify dataset with AffectNet in validation and testing set

Adding scheduler for learning rate adjustment

Source code(tar.gz)
Source code(zip)
vfer_grad_50.pth(327.37 MB)
vfer_grad_50_history_train.pkl(2.86 KB)
vfer_grad_50_history_val.pkl(2.86 KB)
0.2.3(Apr 29, 2022)
Extends data analysis for the AffectNet, CK+48 and FER-2013

Creation of AVFER with the following features

Splitting initial dataset in training and testing set with ratio 80/20

Splitting validation and training set with ratio 90/10

Testing and validation set contains only samples from AffectNet (RGB and high quality images)

Drive of AVFER: https://drive.google.com/drive/folders/1-8WG_CNrU3chL_OHpkM8EYx3Bm129cnE?usp=sharing
Source code(tar.gz)
Source code(zip)
0.2.2(Apr 27, 2022)
Adjust train and test splitting

Balancing augmentation over 150.000 samples

Removing augmentation on validation to increment variability

Loading of vfer for 5, 15 and 25 epochs of training on the result dataset

Loading history for training and validation accuracy/loss

Source code(tar.gz)
Source code(zip)
epoch_15_vfer_small_50(327.37 MB)
epoch_15_vfer_small_50.pth(327.37 MB)
epoch_25_vfer_small.pth(327.37 MB)
epoch_25_vfer_small_50(327.37 MB)
epoch_5_vfer_small_50(327.37 MB)
vfer_small_15_on_50_history_loss.pkl(220 bytes)
vfer_small_15_on_50_history_train.pkl(3.00 KB)
vfer_small_15_on_50_history_val.pkl(3.00 KB)
vfer_small_25_on_50_history_loss.pkl(220 bytes)
vfer_small_25_on_50_history_train.pkl(3.00 KB)
vfer_small_25_on_50_history_val.pkl(3.00 KB)
0.2.1(Apr 24, 2022)
Adding integration with partial training during the transformer weights improvements (best-fit)

Updating of the VFER model on 5/50 training epochs with 62% accuracy (state-of-art of AffectNet visual transformer)

Integrating with fluid system for face detection in the cropping phase

Source code(tar.gz)
Source code(zip)
epoch_5_vfer_small_50(327.37 MB)
0.2.0(Apr 22, 2022)
Adjust normalization parameters from [0.48, 0.28] to 0.5

Balancing dataset with not augment element in validation

Resize the training set on double capacity for less epochs on training phase

Adding featuring and inference on video capture tools in OpenCV for models applications

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 18, 2022)
Model dump for batch 50 on 12 epochs for the VFER transformer, accuracy of 69%

Model dump for batch 60 on 24 epochs for the VFER transformer, accuracy of 70%

Model dump for batch 60 on 50 epochs for the VFER transformer, accuracy of 71%

Debugging notebook for the loss evaluation

Adding every section until the evaluation

Integration of the dataset available here

Source code(tar.gz)
Source code(zip)
vfer_base_12.zip(304.26 MB)
vfer_base_24.zip(304.25 MB)
vfer_base_50.zip(608.51 MB)

Owner

Mario Sessa

Computer Scientist for /dev/null. Master Student in Computer Science.

GitHub Repository

This game was designed to encourage young people not to gamble on lotteries, as the probablity of correctly guessing the number is infinitesimal!

Lottery Simulator 2022 for Web Launch Application Developed by John Seong in Ontario. This game was designed to encourage young people not to gamble o

2 Sep 02, 2022

CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

181 Dec 27, 2022

Non-Attentive-Tacotron - This is Pytorch Implementation of Google's Non-attentive Tacotron.

Non-attentive Tacotron - PyTorch Implementation This is Pytorch Implementation of Google's Non-attentive Tacotron, text-to-speech system. There is som

46 Dec 19, 2022

Light-Head R-CNN

Light-head R-CNN Introduction We release code for Light-Head R-CNN. This is my best practice for my research. This repo is organized as follows: light

835 Dec 06, 2022

Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord.

numpy2tfrecord Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord. Installation

2 Jan 16, 2022

RANZCR-CLiP 7th Place Solution

RANZCR-CLiP 7th Place Solution This repository is WIP. (18 Mar 2021) Installation git clone https://github.com/analokmaus/kaggle-ranzcr-clip-public.gi

21 Oct 22, 2022

Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Axel It's our graduation project about 3D printed robotic hands and they control

0 Feb 14, 2022

Python scripts for performing lane detection using the LSTR model in ONNX

ONNX LSTR Lane Detection Python scripts for performing lane detection using the Lane Shape Prediction with Transformers (LSTR) model in ONNX. Requirem

29 Aug 30, 2022

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers Authors: Jaemin Cho, Abhay Zala, and Mohit Bansal (

98 Dec 15, 2022

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

Project This repo has been populated by an initial template to help get you started. Please make sure to update the content to build a great experienc

674 Dec 26, 2022

This repository compare a selfie with images from identity documents and response if the selfie match.

aws-rekognition-facecompare This repository compare a selfie with images from identity documents and response if the selfie match. This code was made

1 Jan 27, 2022

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

10 Oct 06, 2022

Efficient Sparse Attacks on Videos using Reinforcement Learning

EARL This repository provides a simple implementation of the work "Efficient Sparse Attacks on Videos using Reinforcement Learning" Example: Demo: Her

12 Dec 05, 2021

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

408 Jan 01, 2023

A DeepStack custom model for detecting common objects in dark/night images and videos.

DeepStack_ExDark This repository provides a custom DeepStack model that has been trained and can be used for creating a new object detection API for d

98 Dec 24, 2022

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

157 Jan 01, 2023

Parameter Efficient Deep Probabilistic Forecasting

PEDPF Parameter Efficient Deep Probabilistic Forecasting (PEDPF) is a repository containing code to run experiments for several deep learning based pr

10 Jun 13, 2022

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

114 Oct 16, 2022

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

Status: Under development (expect bug fixes and huge updates) ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectiv

37 Dec 28, 2022

official implemntation for "Contrastive Learning with Stronger Augmentations"

47 Nov 29, 2022

Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Related tags

Overview

Visual Transformer for Facial Emotion Recognition (FER)

Dataset

Data Management

Model

Authors

License

You might also like...

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

A Moonraker plug-in for real-time compensation of frame thermal expansion

Comments

Pre-processing phase removes some images

Releases(0.3.12)

0.3.12(May 16, 2022)

0.3.11(May 14, 2022)

0.3.10(May 13, 2022)

0.3.9(May 12, 2022)

0.3.8(May 11, 2022)

0.3.7(May 11, 2022)

0.3.6(May 9, 2022)

0.3.5(May 9, 2022)

0.2.5(May 7, 2022)

0.2.4(May 6, 2022)

0.2.3(Apr 29, 2022)

0.2.2(Apr 27, 2022)

0.2.1(Apr 24, 2022)

0.2.0(Apr 22, 2022)

0.1.0(Apr 18, 2022)

Owner

Mario Sessa

This game was designed to encourage young people not to gamble on lotteries, as the probablity of correctly guessing the number is infinitesimal!

CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

Non-Attentive-Tacotron - This is Pytorch Implementation of Google's Non-attentive Tacotron.

Light-Head R-CNN

Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord.

RANZCR-CLiP 7th Place Solution

Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Python scripts for performing lane detection using the LSTR model in ONNX

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

This repository compare a selfie with images from identity documents and response if the selfie match.

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Efficient Sparse Attacks on Videos using Reinforcement Learning

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

A DeepStack custom model for detecting common objects in dark/night images and videos.

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Parameter Efficient Deep Probabilistic Forecasting

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

official implemntation for "Contrastive Learning with Stronger Augmentations"