Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Last update: Dec 30, 2022

Related tags

Deep Learning DeCLIP

Overview

DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm.

Our paper is available in arxiv

Updates

** Our code, dataset and models will be relased soon**

Introduction

Recently, large-scale Contrastive Language-Image Pre-training (CLIP) (Radfordet al., 2021) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks. However, CLIP is quite data-hungry and requires 400M image-text pairs for pre-training, thereby restricting its adoption. This work proposes a novel training paradigm, Data efficient CLIP (DeCLIP), to alleviate this limitation. We demonstrate that by carefully utilizing the widespread supervision among the image-text pairs, our DeCLIP can learn generic visual features more efficiently. Instead of using the single image-text contrastive supervision, we fully exploit data potential through the use of (1) self-supervision within each modality; (2) multi-view supervision across modalities; (3) nearest-neighbor supervision from other similar pairs. Benefiting from these intrinsic supervision, our DeCLIP-ResNet50 can achieve 60.4% zero-shot top1 accuracy on ImageNet, which is 0.8% above the CLIP-ResNet50 while using 7.1× fewer data. Our DeCLIP-ResNet50 outperforms its counterpart in 8 out of 11 visual datasets when transferred to downstream tasks. Moreover, Scaling up the model and computing also works well in our framework.

Model

Our pretrain visual backbone model (w/o text encoder)

DeCLIP_r50 GoogleDriver.
DeCLIP_vitb32 GoogleDriver

Citing DeCLIP

@misc{li2021supervision,
      title={Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm}, 
      author={Yangguang Li and Feng Liang and Lichen Zhao and Yufeng Cui and Wanli Ouyang and Jing Shao and Fengwei Yu and Junjie Yan},
      year={2021},
      eprint={2110.05208},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Related tags

Overview

DeCLIP

Updates

Introduction

Model

Our pretrain visual backbone model (w/o text encoder)

Citing DeCLIP

Owner

Sense-GVT

Learning Temporal Consistency for Low Light Video Enhancement from Single Images (CVPR2021)

Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

Official implementation of Sparse Transformer-based Action Recognition

SIEM Logstash parsing for more than hundred technologies

50-days-of-Statistics-for-Data-Science - This repository consist of a 50-day program

Unofficial Alias-Free GAN implementation. Based on rosinality's version with expanded training and inference options.

Fast and accurate optimisation for registration with little learningconvexadam

PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

Change Detection in SAR Images Based on Multiscale Capsule Network

Doge-Prediction - Coding Club prediction ig

Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

Benchmarking Pipeline for Prediction of Protein-Protein Interactions

Happywhale - Whale and Dolphin Identification Silver🥈 Solution (26/1588)

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning.

python 93% acc. CNN Dogs Vs Cats ( Pytorch )

PyTorch implementation of PSPNet segmentation network

GUPNet - Geometry Uncertainty Projection Network for Monocular 3D Object Detection

基于Pytorch实现优秀的自然图像分割框架！(包括FCN、U-Net和Deeplab)