Exploring Visual Engagement Signals for Representation Learning

Last update: Jul 23, 2022

Related tags

Deep Learning vise

Overview

Exploring Visual Engagement Signals for Representation Learning

Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim
Cornell University, Facebook AI

arXiv: https://arxiv.org/abs/2104.07767

VisE is a pretraining approach which leverages Visual Engagement clues as supervisory signals. Given the same image, visual engagement provide semantically and contextually richer information than conventional recognition and captioning tasks. VisE transfers well to subjective downstream computer vision tasks like emotion recognition or political bias classification.

💬 Loading pretrained models

❗ NOTE: This is a torchvision-like model (all the layers before the last global average-pooling layer.). Given a batch of image tensors with size (B, 3, 224, 224), the provided models produce spatial image features of shape (B, 2048, 7, 7), where B is the batch size.

Loading models with torch.hub

Get the pretrained ResNet-50 models from VisE in one line!

VisE-250M (ResNet-50): this model is pretrained with 250 million public image posts.

import torch
model = torch.hub.load("KMnP/vise", "resnet50_250m", pretrained=True)

VisE-1.2M (ResNet-50): This model is pretrained with 1.23 million public image posts.

import torch
model = torch.hub.load("KMnP/vise", "resnet50_1m", pretrained=True)

Loading models manually

	Arch	Size	Model
VisE-250M	ResNet-50	94.3 MB	download
VisE-1.2M	ResNet-50	94.3 MB	download

If you encounter any issues with torch.hub, alternatively you can download the model checkpoints manually, and then following the script below.

import torch
import torchvision

# Create a torchvision resnet50 with randomly initialized weights.
model = torchvision.models.resnet50(pretrained=False)

# Get the model before the global aver-pooling layer.
model = torch.nn.Sequential(*list(model.children())[:-2])

# load the pretrained model from a local path: <CHECKPOINT_PATH>:
model.load_state_dict(torch.load(CHECKPOINT_PATH))

💬 Citing VisE

If you find VisE useful in your research, please cite the following publication.

@misc{jia2021vise,
      title={Exploring Visual Engagement Signals for Representation Learning}, 
      author={Menglin Jia and Zuxuan Wu and Austin Reiter and Claire Cardie and Serge Belongie and Ser-Nam Lim},
      year={2021},
      eprint={2104.07767},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

💬 Acknowledgments

We thank Marseille who was featured in the teaser photo.

💬 License

VisE models are released under the CC-BY-NC 4.0 license. See LICENSE for additional details.

Exploring Visual Engagement Signals for Representation Learning

Related tags

Overview

Exploring Visual Engagement Signals for Representation Learning

Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim
Cornell University, Facebook AI

💬 Loading pretrained models

Loading models with torch.hub

Loading models manually

💬 Citing VisE

💬 Acknowledgments

💬 License

Owner

Menglin Jia

TGS Salt Identification Challenge

This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

Training neural models with structured signals.

Lightweight Python library for adding real-time object tracking to any detector.

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

A stable algorithm for GAN training

PyTorch Implementation of ECCV 2020 Spotlight TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

A Python Reconnection Tool for alt:V

Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

NeuralForecast is a Python library for time series forecasting with deep learning models

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

Links to works on deep learning algorithms for physics problems, TUM-I15 and beyond

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

Adversarial Learning for Modeling Human Motion

Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

中文语音识别系列，读者可以借助它快速训练属于自己的中文语音识别模型，或直接使用预训练模型测试效果。

Joint Gaussian Graphical Model Estimation: A Survey

Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.

Exploring Visual Engagement Signals for Representation Learning

Related tags

Overview

Exploring Visual Engagement Signals for Representation Learning

Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim Cornell University, Facebook AI

💬 Loading pretrained models

Loading models with torch.hub

Loading models manually

💬 Citing VisE

💬 Acknowledgments

💬 License

Owner

Menglin Jia

TGS Salt Identification Challenge

This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

Training neural models with structured signals.

Lightweight Python library for adding real-time object tracking to any detector.

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

A stable algorithm for GAN training

PyTorch Implementation of ECCV 2020 Spotlight TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

A Python Reconnection Tool for alt:V

Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

NeuralForecast is a Python library for time series forecasting with deep learning models

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

Links to works on deep learning algorithms for physics problems, TUM-I15 and beyond

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

Adversarial Learning for Modeling Human Motion

Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

中文语音识别系列，读者可以借助它快速训练属于自己的中文语音识别模型，或直接使用预训练模型测试效果。

Joint Gaussian Graphical Model Estimation: A Survey

Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.

Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim
Cornell University, Facebook AI