Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Last update: May 11, 2022

Overview

Keyword Spotting Transformer

This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train on the 35 words speech command dataset

Paper : Keyword Transformer: A Self-Attention Model for Keyword Spotting

Model architecture

Download the dataset

To download the dataset use the following command

wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
mkdir data
mv ./speech_commands_v0.02.tar.gz ./data
cd ./data
tar -xf ./speech_commands_v0.02.tar.gz
cd ../

Setup virtual environment

virtualenv -p python3 venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt

Training the model

To train the model run this command

python3 train.py --data_dir ${Path to data directory} \
                 --logdir ${Path to log directory} \
                 --num_layers ${Number of sequential encoder layers} \
                 --d_model ${Dimension of the encoder layers} \
                 --num_heads ${Number of heads in multi head attention layer} \
                 --mlp_dim ${Dimension of mlp layers} \
                 --lr ${Learning rate} \
                 --weight_decay ${Weight decay} \
                 --batch_size ${Batch size} \
                 --epochs ${Number of epochs} \
                 --save_dir ${Directory to save the model weights}

To track your training metrics

tensorboard --logdir  ${Path to log directory}

Predicting keyword of audio file

To predict the keyword of the audio file

python3 test.py --model_dir ${Saved model directory} \
                --file_path ${Audio file}

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Related tags

Overview

Keyword Spotting Transformer

Model architecture

Download the dataset

Setup virtual environment

Install dependencies

Training the model

Predicting keyword of audio file

Owner

Intelligent Machines Limited

This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

Dilated Convolution with Learnable Spacings PyTorch

Multi-tool reverse engineering collaboration solution.

Anime Face Detector using mmdet and mmpose

HDMapNet: A Local Semantic Map Learning and Evaluation Framework

Demo project for real time anomaly detection using kafka and python

MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images (ISBI 2021, MELBA 2021)

Official PyTorch Implementation of SSMix (Findings of ACL 2021)

Topic Modelling for Humans

The Official Repository for "Generalized OOD Detection: A Survey"

Human Dynamics from Monocular Video with Dynamic Camera Movements

Edison AT is software Depression Assistant personal.

Unofficial implementation of Proxy Anchor Loss for Deep Metric Learning

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

A complete, self-contained example for training ImageNet at state-of-the-art speed with FFCV

Python based framework for Automatic AI for Regression and Classification over numerical data.

(CVPR 2021) Lifting 2D StyleGAN for 3D-Aware Face Generation

Deep Learning to Create StepMania SM FIles

Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining