Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Last update: Dec 16, 2022

Overview

SETR - Pytorch

Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official code,I implemented SETR-Progressive UPsampling(SETR-PUP) using pytorch.

Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Vit

The Vit model is also implemented, and you can use it for image classification.

Usage SETR

from SETR.transformer_seg import SETRModel
import torch 

if __name__ == "__main__":
    net = SETRModel(patch_size=(32, 32), 
                    in_channels=3, 
                    out_channels=1, 
                    hidden_size=1024, 
                    num_hidden_layers=8, 
                    num_attention_heads=16, 
                    decode_features=[512, 256, 128, 64])
    t1 = torch.rand(1, 3, 256, 256)
    print("input: " + str(t1.shape))
    
    # print(net)
    print("output: " + str(net(t1).shape))

If the output size is (1, 1, 256, 256), the code runs successfully.

Usage Vit

from SETR.transformer_seg import Vit
import torch 

if __name__ == "__main__":
    model = Vit(patch_size=(7, 7), 
                    in_channels=1, 
                    out_class=10, 
                    hidden_size=1024, 
                    num_hidden_layers=1, 
                    num_attention_heads=16)
    print(model)
    t1 = torch.rand(1, 1, 28, 28)
    print("input: " + str(t1.shape))

    print("output: " + str(model(t1).shape))

The output shape is (1, 10).

current examples

task_mnist: The simplest example, using the Vit model to classify the minst dataset.
task_car_seg: The example is sample segmentation task. data download: https://www.kaggle.com/c/carvana-image-masking-challenge/data

More examples will be updated later.

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Related tags

Overview

SETR - Pytorch

Vit

Usage SETR

Usage Vit

current examples

more

Owner

zhaohu xing

Personal project about genus-0 meshes, spherical harmonics and a cow

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

MultiLexNorm 2021 competition system from ÚFAL

A TensorFlow implementation of the Mnemonic Descent Method.

PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

My personal Home Assistant configuration.

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

An unofficial personal implementation of UM-Adapt, specifically to tackle joint estimation of panoptic segmentation and depth prediction for autonomous driving datasets.

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Add gui for YoloV5 using PyQt5

Blind visual quality assessment on 360° Video based on progressive learning

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.