PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Last update: Dec 23, 2022

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)

PyTorch code for M²HSE. The local-level subenetwork of our M²HSE is built on top of the VSESC.

Xinlei Pei, Zheng Liu, Shaojing Yuan, Shanshan Gao, Huijian Han and Caiming Zhang. "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Introduction

We give a demo code of the Corel 5K dataset, including the details of training process for the global-level subnetwork and the local-level subnetwork.

Requirements

We recommended the following dependencies.

Python 3.6
PyTorch (1.3.1)
NumPy (1.19.2)
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data

The raw images and the corrsponding texts can be downloaded from here. Note that we performed data cleaning on this dataset and the specific operations are described in the paper.

Besides, 1) for extracting the fine-grained visual features, the raw images are divided uniformly into 3*3 blocks. 2) we adopt the AlexNet, pre-trained on ImageNet, to extract the CNN features. 3) We upload text data in the ./data/coarse-grained-data/ and ./data/fine-grained-data . Therefore, for data preparation you have the following two options :

Download the above raw data and extract the corresponding features according to the strategy we introduced in the paper.
Contact us for relevant data. (Email: [email protected])

Training models

For training the global-level subnetwork:

Run train_global.py:

python train_global.py 
    --data_path ./data/coarse-grained-data
    --data_name corel5k_precomp 
    --vocab_path ./vocab 
    --logger_name ./checkpoint/M2HSE/Global/Corel5K 
    --model_name ./checkpoint/M2HSE/Global/Corel5K 
    --num_epochs 100 
    --lr_updata 50 
    --batchsize 100  
    --gamma_1 1 
    --gamma_2 .5 
    --alpha_1 .8 
    --alpha_2 .8

For training the local-level subnetwork:

Run train_local.py:

python train_local.py 
    --data_path ./data/fine-grained-data
    --data_name corel5k_precomp 
    --vocab_path ./vocab 
    --logger_name ./checkpoint/M2HSE/Local/Corel5K 
    --model_name ./checkpoint/M2HSE/Local/Corel5K 
    --num_epochs 100 
    --lr_updata 50 
    --batchsize 100  
    --gamma_1 1 
    --gamma_2 .5 
    --beta_1 .4 
    --beta_2 .4

Reference

Stay tuned. :)

License

Apache License 2.0

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Related tags

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)

Introduction

Requirements

Download data

Training models

Reference

License

Owner

Xinlei-Pei

alfred-py: A deep learning utility library for human

Rethinking the U-Net architecture for multimodal biomedical image segmentation

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

This is a Tensorflow implementation of Learning to See in the Dark in CVPR 2018

The repository contain code for building compiler using puthon.

Face Recognition Attendance Project

A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

'Solving the sampling problem of the Sycamore quantum supremacy circuits

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

https://arxiv.org/abs/2102.11005

Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21

Distributed Arcface Training in Pytorch

Deep Halftoning with Reversible Binary Pattern

Code and hyperparameters for the paper "Generative Adversarial Networks"

Source codes for "Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs"

Image restoration with neural networks but without learning.

Codes for CVPR2021 paper "PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization"

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Related tags

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE)

Introduction

Requirements

Download data

Training models

Reference

License

Owner

Xinlei-Pei

alfred-py: A deep learning utility library for **human**

Rethinking the U-Net architecture for multimodal biomedical image segmentation

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

This is a Tensorflow implementation of Learning to See in the Dark in CVPR 2018

The repository contain code for building compiler using puthon.

Face Recognition Attendance Project

A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

'Solving the sampling problem of the Sycamore quantum supremacy circuits

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

https://arxiv.org/abs/2102.11005

Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21

Distributed Arcface Training in Pytorch

Deep Halftoning with Reversible Binary Pattern

Code and hyperparameters for the paper "Generative Adversarial Networks"

Source codes for "Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs"

Image restoration with neural networks but without learning.

Codes for CVPR2021 paper "PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization"

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)

alfred-py: A deep learning utility library for human