Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Last update: Sep 07, 2022

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

This repository is derived from the NMTGMinor project at https://github.com/quanpn90/NMTGMinor
The SVCCA calculation is derived from https://github.com/nlp-dke/svcca

Powered by Mediaan.com

Speech Translation (ST) is the task of translating speech audio in a source language into text in a target language. This repository implements and experiments on different approaches for ST:

Cascaded ST, including 2 steps: Automatic Speech Recognition (ASR) and Machine Translation (MT)
Direct ST: models trained only on ST data
(Main contribution) End-to-end ST limiting the use of ST data: multi-modal models leveraging ASR and MT training data for ST task

The Transformer architecture is used as the baseline for the implementation.

High-level instruction to use the repo:

Run covost_data_preparation.py to download and preprocess the data.
Run the shell script of interst, change the variables in the script if needed.
- run_translation_pipeline.sh for single-task models (ASR, MT, ST)
- cascaded_ST_evaluation.sh evaluates cascaded ST using pretrained ASR and MT models
- run_translation_multi_modalities_pipeline.sh for multi-task, multi-modality models (including zero-shot)
- run_zeroshot_with_artificial_data.sh for zero-shot models using data augmentation
- run_bidirectional_zeroshot.sh for zero-shot models using additional opposite training data
- run_fine_tunning.sh, run_fine_tunning_fromASR.sh for fine-tuning models with ST data, resulting in few-shot models
- modality_similarity_svcca.sh, modality_similarity_classifier.sh measure text-audio similarity in representation

See notebooks/Repo_Instruction.ipynb for more details.

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Related tags

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Owner

Tu Anh Dinh

Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

PyTorch implementation of DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images

Deep Networks with Recurrent Layer Aggregation

An End-to-End Machine Learning Library to Optimize AUC (AUROC, AUPRC).

Sequential model-based optimization with a `scipy.optimize` interface

Educational 2D SLAM implementation based on ICP and Pose Graph

This repository includes code of my study about Asynchronous in Frequency domain of GAN images.

Official Implementation for the paper DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification

Codes for the compilation and visualization examples to the HIF vegetation dataset

MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

Centroid-UNet is deep neural network model to detect centroids from satellite images.

Training PSPNet in Tensorflow. Reproduce the performance from the paper.

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

PyTorch implementation of a Real-ESRGAN model trained on custom dataset

Efficient 6-DoF Grasp Generation in Cluttered Scenes

PyTorch wrapper for Taichi data-oriented class

Code for STFT Transformer used in BirdCLEF 2021 competition.