Zsseg.baseline - Zero-Shot Semantic Segmentation

Overview

This repo is for our paper A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model. It is based on the official repo of MaskFormer.

@article{xu2021ss,
  title={End-to-End Semi-Supervised Object Detection with Soft Teacher},
  author={Xu, Mengde and Zhang, Zheng and Hu, Han and Wang, Jianfeng and Wang, Lijuan and Wei, Fangyun and Bai, Xiang and Liu, Zicheng},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}

Guideline

  • Enviroment

    torch==1.8.0
    torchvision==0.9.0
    detectron2==0.5 #Following https://detectron2.readthedocs.io/en/latest/tutorials/install.html to install it and some required packages
    mmcv==1.3.14

    FurtherMore, install the modified clip package.

    cd third_party/CLIP
    python -m pip install -Ue .
  • Data Preparation

    In our experiments, four datasets are used. For Cityscapes and ADE20k, follow the tutorial in MaskFormer.

  • For COCO Stuff 164k:

    • Download data from the offical dataset website and extract it like below.
      Datasets/
           coco/
                #http://images.cocodataset.org/zips/train2017.zip
                train2017/ 
                #http://images.cocodataset.org/zips/val2017.zip
                val2017/   
                #http://images.cocodataset.org/annotations/annotations_trainval2017.zip
                annotations/ 
                #http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
                stuffthingmaps/ 
    • Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.
      python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco
      
      python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/train2017_base datasets/coco/stuffthingmaps_detectron2/train2017_base_label_count.pkl
      
      python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/val2017 datasets/coco/stuffthingmaps_detectron2/val2017_label_count.pkl
  • For Pascal VOC 11k:

    • Download data from the offical dataset website and extract it like below.
    datasets/
       VOC2012/
            #http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
            JPEGImages/
            val.txt
            #http://home.bharathh.info/pubs/codes/SBD/download.html
            SegmentationClassAug/
            #https://gist.githubusercontent.com/sun11/2dbda6b31acc7c6292d14a872d0c90b7/raw/5f5a5270089239ef2f6b65b1cc55208355b5acca/trainaug.txt
            train.txt
            
    • Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.
    python datasets/prepare_voc_sem_seg.py datasets/VOC2012
    
    python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train datasets/VOC2012/annotations_detectron2/train_base_label_count.json
    
    python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/val datasets/VOC2012/annotations_detectron2/val_label_count.json
  • Training and Evaluation

    Before training and evaluation, see the tutorial in detectron2. For example, to training a zero shot semantic segmentation model on COCO Stuff:

  • Training with manually designed prompts:

    python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_single_prompt_bs32_60k.yaml
    
  • Training with learned prompts:

    # Training prompts
    python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_proposal_classification_learn_prompt_bs32_10k.yaml --num-gpus 8 
    # Training seg model
    python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_bs32_60k.yaml --num-gpus 8 MODEL.CLIP_ADAPTER.PROMPT_CHECKPOINT ${TRAINED_PROMPTS}

    Note: the prompts training will be affected by the random seed. It is better to run it multiple times.

    For evaluation, add --eval-only flag to the traing command.

  • Trained Model

    😄 Coming soon.

This package implements THOR: Transformer with Stochastic Experts.

THOR: Transformer with Stochastic Experts This PyTorch package implements Taming Sparsely Activated Transformer with Stochastic Experts. Installation

Microsoft 45 Nov 22, 2022
This repository contains tutorials for the py4DSTEM Python package

py4DSTEM Tutorials This repository contains tutorials for the py4DSTEM Python package. For more information about py4DSTEM, including installation ins

11 Dec 23, 2022
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

HTSeq DEVS: https://github.com/htseq/htseq DOCS: https://htseq.readthedocs.io A Python library to facilitate programmatic analysis of data from high-t

HTSeq 57 Dec 20, 2022
Newt - a Gaussian process library in JAX.

Newt __ \/_ (' \`\ _\, \ \\/ /`\/\ \\ \ \\

AaltoML 0 Nov 02, 2021
Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Gesture Controlled Video Game Detailed Blog : https://www.analyticsvidhya.com/blog/2021/06/gesture-controlled-video-game/ Introduction This project is

Devbrat Anuragi 35 Jan 06, 2023
Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.

Posture: Pose Tracking and Machine Learning for prescribing corrective suggestions to improve posture and form while exercising. This repository conta

Pratham Mehta 10 Nov 11, 2022
Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification"

hypergraph_reid Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification" If you find this help your research,

62 Dec 21, 2022
Simple ONNX operation generator. Simple Operation Generator for ONNX.

sog4onnx Simple ONNX operation generator. Simple Operation Generator for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools Key concept V

Katsuya Hyodo 6 May 15, 2022
This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation.

ERFNet This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation. NEW!! New PyTorch

Edu 104 Jan 05, 2023
Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes

Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes This repository is the official implementation of Us

Damien Bouchabou 0 Oct 18, 2021
Deep learning for spiking neural networks

A deep learning library for spiking neural networks. Norse aims to exploit the advantages of bio-inspired neural components, which are sparse and even

Electronic Vision(s) Group — BrainScaleS Neuromorphic Hardware 59 Nov 28, 2022
TensorFlow implementation of "Learning from Simulated and Unsupervised Images through Adversarial Training"

Simulated+Unsupervised (S+U) Learning in TensorFlow TensorFlow implementation of Learning from Simulated and Unsupervised Images through Adversarial T

Taehoon Kim 569 Dec 29, 2022
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Collie do

ShopRunner 96 Dec 29, 2022
Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search This is an implementation for our paper Contextual Non-Loca

Tencent YouTu Research 50 Dec 03, 2022
Keras implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping

Keras implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping

Yam Peleg 63 Sep 21, 2022
Human4D Dataset tools for processing and visualization

HUMAN4D: A Human-Centric Multimodal Dataset for Motions & Immersive Media HUMAN4D constitutes a large and multimodal 4D dataset that contains a variet

tofis 15 Nov 09, 2022
PyTorch implementation for paper StARformer: Transformer with State-Action-Reward Representations.

StARformer This repository contains the PyTorch implementation for our paper titled StARformer: Transformer with State-Action-Reward Representations.

Jinghuan Shang 14 Dec 09, 2022
Adaout is a practical and flexible regularization method with high generalization and interpretability

Adaout Adaout is a practical and flexible regularization method with high generalization and interpretability. Requirements python 3.6 (Anaconda versi

lambett 1 Feb 09, 2022
Auto grind btdb2 exp for tower

Bloons TD Battles 2 EXP Grinder Auto grind btdb2 exp for towers Setup I suggest checking out every screenshot to see what they are supposed to be, so

Vincent 6 Jul 29, 2022
Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)

Cross View Transformers This repository contains the source code and data for our paper: Cross-view Transformers for real-time Map-view Semantic Segme

Brady Zhou 363 Dec 25, 2022