Norm-based Analysis of Transformer

Implementations for 2 papers introducing to analyze Transformers using vector norms:

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)
Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

This paper proposed to analyze attention, a core component of Transformer, using vector norms rather than attention weights.
Transformer analyses have been focused on mixing in attention and have typically observed attention weights.
However, in addition to attention weights, there are more factors to determine attention's outputs: the input vector itself and vector transformations.
Then, this paper proposed to analyze attention using vector norms considering them.
→ Check this paper's code: Code for emnlp2020.

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

This paper proposed to analyze attention block (i.e., attention, residual connection, and layer normalization) using vector norms.
Transformer analyses have been focused on mixing in attention.
However, there are components other than attention in Transformer, and they can play a role other than mixing.
Then, this paper proposed to expand the scope of Transformer analysis from attention into attention block.
→ Check this paper's code: Code for emnlp2021.

Citation

If you use our code for academic work, please cite:

@inproceedings{kobayashi-etal-2020-attention,  
   title = {Attention is Not Only a Weight: Analyzing Transformers with Vector Norms},  
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},  
   booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},  
   year = "2020",  
   url = "https://www.aclweb.org/anthology/2020.emnlp-main.574",  
   pages = "7057--7075",  
}
@inproceedings{kobayashi-etal-2021-incorporating,
   title = {Incorporating Residual and Normalization Layers into Analysis of Masked Language Models},
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},
   booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Proceeding (EMNLP)},
   year = "2021",
   url = "https://arxiv.org/abs/2109.07152",
   pages = "to appear",
}

Norm-based Analysis of Transformer

Related tags

Overview

Norm-based Analysis of Transformer

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Citation

Owner

Goro Kobayashi

Tutorial page of the Climate Hack, the greatest hackathon ever

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

[CVPR'2020] DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

A Human-in-the-Loop workflow for creating HD images from text

labelpix is a graphical image labeling interface for drawing bounding boxes

Reinforcement learning algorithms in RLlib

Few-Shot Object Detection via Association and DIscrimination

Emotion Recognition from Facial Images

Official code for UnICORNN (ICML 2021)

Unsupervised clustering of high content screen samples

Cortex-compatible model server for Python and TensorFlow

An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

[ICML 2021] "Graph Contrastive Learning Automated" by Yuning You, Tianlong Chen, Yang Shen, Zhangyang Wang

mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms.

GRF: Learning a General Radiance Field for 3D Representation and Rendering

Permute Me Softly: Learning Soft Permutations for Graph Representations