Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Last update: Sep 06, 2021

Overview

Period-alternatives-of-Softmax

Experimental Demo for our paper

'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism'

We suggest that replacing the exponential function by periodic functions. Through experiments on a simply designed demo referenced to LeViT, our method is proved to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants.

** Create your own 'dataset' fold, and maybe need to modify the demo.py file for your own dataset except for cifar-10, cifar-100 and Tiny-imageNet.

Function available:

softmax , norm_softmax
sinmax, norm_sinmax
cosmax, norm_cosmax
sin_2_max, norm_sin_2_max
sin_2_max_move, norm_sin_2_max_move
sirenmax, norm_sirenmax
sin_softmax, norm_sin_softmax

mode available:

search:
        Random search for a suitable set of learning rate and weight decay, and record the results in 
        Attention_test/*functions/lr_wd_search.txt
run:
        Train the demo, and there will be four .npy files created in root.
        (1) 'record_val_acc.npy' for val acc record every 100 iter;
        (2) 'record_train_acc.npy' for train acc record every batch;
        (3) 'record_loss.npy' for train loss record every batch;
        (4) 'kq_value.npy' for Q.K record *before sclaled*.
att_run:
        Same as the run mode but:
        (1) No kq_value record;
        (2) Every 5 epoch, input a test image and record the attention score map of each head of each layer.
            Saved in 'Attention_test/attention_maps.npy'

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Related tags

Overview

Period-alternatives-of-Softmax

'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism'

Function available:

mode available:

Owner

slwang9353

An Implementation of SiameseRPN with Feature Pyramid Networks

Hyperparameter Optimization for TensorFlow, Keras and PyTorch

Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model

A2LP for short, ECCV2020 spotlight, Investigating SSL principles for UDA problems

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

An open-source online reverse dictionary.

Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation.

Sample code from the Neural Networks from Scratch book.

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

PoolFormer: MetaFormer is Actually What You Need for Vision

Systematic generalisation with group invariant predictions

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

HCQ: Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval

Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations

This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

A flexible submap-based framework towards spatio-temporally consistent volumetric mapping and scene understanding.

A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Understanding Convolutional Neural Networks from Theoretical Perspective via Volterra Convolution