Implementation of the HMAX model of vision in PyTorch

Overview

PyTorch implementation of HMAX

PyTorch implementation of the HMAX model that closely follows that of the MATLAB implementation of The Laboratory for Computational Cognitive Neuroscience:

http://maxlab.neuro.georgetown.edu/hmax.html

The S and C units of the HMAX model can almost be mapped directly onto TorchVision's Conv2d and MaxPool2d layers, where channels are used to store the filters for different orientations. However, HMAX also implements multiple scales, which doesn't map nicely onto the existing TorchVision functionality. Therefore, each scale has its own Conv2d layer, which are executed in parallel.

Here is a schematic overview of the network architecture:

layers consisting of units with increasing scale
S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1
 \ /   \ /   \ /   \ /   \ /   \ /   \ /   \ /
  C1    C1    C1    C1    C1    C1    C1    C1
   \     \     \    |     /     /     /     /
           ALL-TO-ALL CONNECTIVITY
   /     /     /    |     \     \     \     \
  S2    S2    S2    S2    S2    S2    S2    S2
   |     |     |     |     |     |     |     |
  C2    C2    C2    C2    C2    C2    C2    C2

Installation

This script depends on the NumPy, SciPy, PyTorch and TorchVision packages.

Clone the repository somewhere and run the example.py script:

git clone https://github.com/wmvanvliet/pytorch_hmax
python example.py

Usage

See the example.py script on how to run the model on 10 example images.

You might also like...
Pytorch implementation of
Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

Token Labeling: Training an 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet (arxiv) This is a Pytorch implementation of our te

This repository contains a pytorch implementation of
This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision | Project Page | Paper | This repository contains a pytorch implementation of "St

PyTorch implementation of
PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)

mlp-mixer-pytorch PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021) Usage import torch from mlp_mixer

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.
Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By

A PyTorch Implementation of ViT (Vision Transformer)
A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

Pytorch implementation of the DeepDream computer vision algorithm
Pytorch implementation of the DeepDream computer vision algorithm

deep-dream-in-pytorch Pytorch (https://github.com/pytorch/pytorch) implementation of the deep dream (https://en.wikipedia.org/wiki/DeepDream) computer

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.
A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

Unofficial PyTorch implementation of MobileViT based on paper
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

Comments
  • Provide direct (not nested) path to stimuli

    Provide direct (not nested) path to stimuli

    Hi,

    great repo and effort. I really admire your courage to write HMAX in python. I have a question about loading data in, namely about this part of the code: https://github.com/wmvanvliet/pytorch_hmax/blob/master/example.py#L18

    I know that by default, the ImageFolder expects to have nested folders (as stated in docs or mentioned in this issue) but it's quite clumsy in this case. Eg even if you look at your example, having subfolders for each photo just doesn't look good. Would you have a way how to go around this? Any suggestion on how to provide only a path to all images and not this nested path? I was reading some discussions but haven't figured out how to implement it.


    One more question (I didn't want to open an extra issue for that), shouldn't in https://github.com/wmvanvliet/pytorch_hmax/blob/master/example.py#L28 be batch_size=len(images)) instead of batch_size=10 (written symbolically)?

    Thanks.

    opened by jankaWIS 5
  • Input of non-square images fails

    Input of non-square images fails

    Hi again, I was playing a bit around and discovered that it fails for non-square dimensional images, i.e. where height != width. Maybe I was looking wrong or missed something, but I haven't found it mentioned anywhere and the docs kind of suggests that it can be any height and any width. The same goes for the description of the layers (e.g. s1). In the other issue, you mentioned that

    One thing you may want to add to this transformer pipeline is a transforms.Resize followed by a transforms.CenterCrop to ensure all images end up having the same height and width

    but didn't mention why. Why is it not possible for non-square images? Is there any workaround if one doesn't want to crop? Maybe to pad like in this post*?

    To demonstrate the issue:

    import os
    import torch
    from torch.utils.data import DataLoader
    from torchvision import datasets, transforms
    import pickle
    
    import hmax
    
    path_hmax = './'
    # Initialize the model with the universal patch set
    print('Constructing model')
    model = hmax.HMAX(os.path.join(path_hmax,'universal_patch_set.mat'))
    
    # A folder with example images
    example_images = datasets.ImageFolder(
        os.path.join(path_hmax,'example_images'),
        transform=transforms.Compose([
            transforms.Resize((400, 500)),
            transforms.CenterCrop((400, 500)),
            transforms.Grayscale(),
            transforms.ToTensor(),
            transforms.Lambda(lambda x: x * 255),
        ])
    )
    
    # A dataloader that will run through all example images in one batch
    dataloader = DataLoader(example_images, batch_size=10)
    
    # Determine whether there is a compatible GPU available
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
    
    # Run the model on the example images
    print('Running model on', device)
    model = model.to(device)
    for X, y in dataloader:
        s1, c1, s2, c2 = model.get_all_layers(X.to(device))
    
    print('[done]')
    

    will give an error in the forward function:

    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    [<ipython-input-7-a6bab15d9571>](https://localhost:8080/#) in <module>()
         33 model = model.to(device)
         34 for X, y in dataloader:
    ---> 35     s1, c1, s2, c2 = model.get_all_layers(X.to(device))
         36 
         37 # print('Saving output of all layers to: output.pkl')
    
    4 frames
    [/gdrive/MyDrive/Colab Notebooks/data_HMAX/pytorch_hmax/hmax.py](https://localhost:8080/#) in forward(self, c1_outputs)
        285             conv_output = conv_output.view(
        286                 -1, self.num_orientations, self.num_patches, conv_output_size,
    --> 287                 conv_output_size)
        288 
        289             # Pool over orientations
    
    RuntimeError: shape '[-1, 4, 400, 126, 126]' is invalid for input of size 203616000
    

    *Code for that:

    import torchvision.transforms.functional as F
    
    class SquarePad:
        def __call__(self, image):
            max_wh = max(image.size)
            p_left, p_top = [(max_wh - s) // 2 for s in image.size]
            p_right, p_bottom = [max_wh - (s+pad) for s, pad in zip(image.size, [p_left, p_top])]
            padding = (p_left, p_top, p_right, p_bottom)
            return F.pad(image, padding, 0, 'constant')
    
    target_image_size = (224, 224)  # as an example
    # now use it as the replacement of transforms.Pad class
    transform=transforms.Compose([
        SquarePad(),
        transforms.Resize(target_image_size),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ])
    
    opened by jankaWIS 1
Releases(v0.2)
  • v0.2(Jul 7, 2022)

    For this version, I've modified the HMAX code a bit to exactly match that of the original MATLAB code of Maximilian Riesenhuber. This is a bit slower and consumes a bit more memory, as the code needs to work around some subtle differences between the MATLAB and PyTorch functions. Perhaps in the future, we could add an "optimized" model that is allowed to deviate from the reference implementation for increased efficiency, but for now I feel it is more important to follow the reference implementation to the letter.

    Major change: default C2 activation function is now 'euclidean' instead of 'gaussian'.

    Source code(tar.gz)
    Source code(zip)
  • v0.1(Jul 7, 2022)

Owner
Marijn van Vliet
Research Software Engineer.
Marijn van Vliet
Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

Second-order Neural ODE Optimizer (NeurIPS 2021 Spotlight) [arXiv] ✔️ faster convergence in wall-clock time | ✔️ O(1) memory cost | ✔️ better test-tim

Guan-Horng Liu 39 Oct 22, 2022
Fantasy Points Prediction and Dream Team Formation

Fantasy-Points-Prediction-and-Dream-Team-Formation Collected Data from open source resources that have over 100 Parameters for predicting cricket play

Akarsh Singh 2 Sep 13, 2022
Framework to build and train RL algorithms

RayLink RayLink is a RL framework used to build and train RL algorithms. RayLink was used to build a RL framework, and tested in a large-scale multi-a

Bytedance Inc. 32 Oct 07, 2022
DualGAN-tensorflow: tensorflow implementation of DualGAN

ICCV paper of DualGAN DualGAN: unsupervised dual learning for image-to-image translation please cite the paper, if the codes has been used for your re

Jack Yi 252 Nov 10, 2022
[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Single Image Depth Prediction with Wavelet Decomposition Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambeto

Niantic Labs 205 Jan 02, 2023
Simulation environments for the CrazyFlie quadrotor: Used for Reinforcement Learning and Sim-to-Real Transfer

Phoenix-Drone-Simulation An OpenAI Gym environment based on PyBullet for learning to control the CrazyFlie quadrotor: Can be used for Reinforcement Le

Sven Gronauer 8 Dec 07, 2022
PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

PerfFuzz Performance problems in software can arise unexpectedly when programs are provided with inputs that exhibit pathological behavior. But how ca

Caroline Lemieux 125 Nov 18, 2022
Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Certified Robustness to Adversarial Word Substitutions This is the official GitHub repository for the following paper: Certified Robustness to Adversa

Robin Jia 38 Oct 16, 2022
AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

AgeGuesser AgeGuesser is an end-to-end, deep-learning based Age Estimation system, presented at the CAIP 2021 conference. You can find the related pap

5 Nov 10, 2022
Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)

Point-Based Modeling of Human Clothing Paper | Project page | Video This is an official PyTorch code repository of the paper "Point-Based Modeling of

Visual Understanding Lab @ Samsung AI Center Moscow 64 Nov 22, 2022
Papers about explainability of GNNs

Papers about explainability of GNNs

Dongsheng Luo 236 Jan 04, 2023
Smart edu-autobooking - Johnson @ DMI-UNICT study room self-booking system

smart_edu-autobooking Sistema di autoprenotazione per l'aula studio [email protected]

Davide Carnemolla 17 Jun 20, 2022
This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

Erika Lu 728 Dec 28, 2022
Semi-supervised semantic segmentation needs strong, varied perturbations

Semi-supervised semantic segmentation using CutMix and Colour Augmentation Implementations of our papers: Semi-supervised semantic segmentation needs

146 Dec 20, 2022
Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

FTLNet_Pytorch Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data 1. Introduction This repo is an unofficial

1 Nov 04, 2020
IAUnet: Global Context-Aware Feature Learning for Person Re-Identification

IAUnet This repository contains the code for the paper: IAUnet: Global Context-Aware Feature Learning for Person Re-Identification Ruibing Hou, Bingpe

30 Jul 14, 2022
Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

SemCo The official pytorch implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

42 Nov 14, 2022
MultiLexNorm 2021 competition system from ÚFAL

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5 David Samuel & Milan Straka Charles University Faculty of

ÚFAL 13 Jun 28, 2022
Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Manga Filling with ScreenVAE SIGGRAPH ASIA 2020 | Project Website | BibTex This repository is for ScreenVAE introduced in the following paper "Manga F

30 Dec 24, 2022
Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization

Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization Code for reproducing our results in the Head2Toe paper. Paper: arxiv.or

Google Research 62 Dec 12, 2022