Unimodal Face Classification with Multimodal Training

This is a PyTorch implementation of the following paper:

Unimodal Face Classification with Multimodal Training

Wenbin Teng (Boston University), Chongyang Bai (Dartmouth College)

Abstract: We propose a Multimodal Training Unimodal Test (MTUT) framework for robust face classification, which exploits the cross-modality relationship during training and applies it as a complementary of the imperfect single modality input during testing. Technically, during training, the framework (1) builds both intra-modality and cross-modality autoencoders with the aid of facial attributes to learn latent embeddings as multimodal descriptors, (2) proposes a novel multimodal embedding divergence loss to align the heterogeneous features from different modalities, which also adaptively avoids the useless modality (if any) from confusing the model. This way, the learned autoencoders can generate robust embeddings in single-modality face classification on test stage. We evaluate our framework in two face classification datasets and two kinds of testing input: (1) poor-condition image and (2) point cloud or 3D face mesh, when both 2D and 3D modalities are available for training.

The proposed method applies both 2D and 3D encoder to extract the embeddings of each individual modalities. Divergence between both embeddings is minimized adaptively through measuring the classification loss. Based on the type of testing modality, we use certain decoder to reconstruct 2D and 3D inputs from feature embeddings. An overview of the proposed network is shown in the following picture:

Unimodal Face Classification with Multimodal Training

Related tags

Overview

Unimodal Face Classification with Multimodal Training

Owner

Wenbin Teng

Relative Uncertainty Learning for Facial Expression Recognition

NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

Material for my PyConDE & PyData Berlin 2022 Talk "5 Steps to Speed Up Your Data-Analysis on a Single Core"

Official repository for the paper "Instance-Conditioned GAN"

Doods2 - API for detecting objects in images and video streams using Tensorflow

Scaling and Benchmarking Self-Supervised Visual Representation Learning

Official Code Implementation of the paper : XAI for Transformers: Better Explanations through Conservative Propagation

Resilience from Diversity: Population-based approach to harden models against adversarial attacks

Human Pose Detection on EdgeTPU

Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation"

Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

Forecasting with Gradient Boosted Time Series Decomposition

AntroPy: entropy and complexity of (EEG) time-series in Python

A Model for Natural Language Attack on Text Classification and Inference

An Empirical Investigation of Model-to-Model Distribution Shifts in Trained Convolutional Filters

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Use of Attention Gates in a Convolutional Neural Network / Medical Image Classification and Segmentation

Implementation for "Exploiting Aliasing for Manga Restoration" (CVPR 2021)

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

This project is used for the paper Differentiable Programming of Isometric Tensor Network