Zsseg.baseline - Zero-Shot Semantic Segmentation

Last update: Dec 20, 2022

Related tags

Overview

This repo is for our paper A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model. It is based on the official repo of MaskFormer.

@article{xu2021ss,
  title={End-to-End Semi-Supervised Object Detection with Soft Teacher},
  author={Xu, Mengde and Zhang, Zheng and Hu, Han and Wang, Jianfeng and Wang, Lijuan and Wei, Fangyun and Bai, Xiang and Liu, Zicheng},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}

Guideline

Enviroment

torch==1.8.0
torchvision==0.9.0
detectron2==0.5 #Following https://detectron2.readthedocs.io/en/latest/tutorials/install.html to install it and some required packages
mmcv==1.3.14

FurtherMore, install the modified clip package.

cd third_party/CLIP
python -m pip install -Ue .

Data Preparation

In our experiments, four datasets are used. For Cityscapes and ADE20k, follow the tutorial in MaskFormer.

For COCO Stuff 164k:

Download data from the offical dataset website and extract it like below.

Datasets/
     coco/
          #http://images.cocodataset.org/zips/train2017.zip
          train2017/ 
          #http://images.cocodataset.org/zips/val2017.zip
          val2017/   
          #http://images.cocodataset.org/annotations/annotations_trainval2017.zip
          annotations/ 
          #http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
          stuffthingmaps/

Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.

python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco

python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/train2017_base datasets/coco/stuffthingmaps_detectron2/train2017_base_label_count.pkl

python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/val2017 datasets/coco/stuffthingmaps_detectron2/val2017_label_count.pkl

For Pascal VOC 11k:

Download data from the offical dataset website and extract it like below.

datasets/
   VOC2012/
        #http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
        JPEGImages/
        val.txt
        #http://home.bharathh.info/pubs/codes/SBD/download.html
        SegmentationClassAug/
        #https://gist.githubusercontent.com/sun11/2dbda6b31acc7c6292d14a872d0c90b7/raw/5f5a5270089239ef2f6b65b1cc55208355b5acca/trainaug.txt
        train.txt

Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.

python datasets/prepare_voc_sem_seg.py datasets/VOC2012

python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train datasets/VOC2012/annotations_detectron2/train_base_label_count.json

python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/val datasets/VOC2012/annotations_detectron2/val_label_count.json

Training and Evaluation

Before training and evaluation, see the tutorial in detectron2. For example, to training a zero shot semantic segmentation model on COCO Stuff:

Training with manually designed prompts:

python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_single_prompt_bs32_60k.yaml

Training with learned prompts:

# Training prompts
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_proposal_classification_learn_prompt_bs32_10k.yaml --num-gpus 8 
# Training seg model
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_bs32_60k.yaml --num-gpus 8 MODEL.CLIP_ADAPTER.PROMPT_CHECKPOINT ${TRAINED_PROMPTS}

Note: the prompts training will be affected by the random seed. It is better to run it multiple times.

For evaluation, add --eval-only flag to the traing command.

Trained Model

😄 Coming soon.

Zsseg.baseline - Zero-Shot Semantic Segmentation

Related tags

Overview

Guideline

Enviroment

Data Preparation

Training and Evaluation

Owner

This package implements THOR: Transformer with Stochastic Experts.

This repository contains tutorials for the py4DSTEM Python package

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

Newt - a Gaussian process library in JAX.

Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.

Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification"

Simple ONNX operation generator. Simple Operation Generator for ONNX.

This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation.

Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes

Deep learning for spiking neural networks

TensorFlow implementation of "Learning from Simulated and Unsupervised Images through Adversarial Training"

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

Keras implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping

Human4D Dataset tools for processing and visualization

PyTorch implementation for paper StARformer: Transformer with State-Action-Reward Representations.

Adaout is a practical and flexible regularization method with high generalization and interpretability

Auto grind btdb2 exp for tower

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)