A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

Last update: Nov 30, 2021

Related tags

Overview

CNN from scratch

The most interesting part is in the folder neural_networks/layers.py: Code for a convolutional neural network, based on only numpy (no PyTorch or TensorFlow). It is therefore very foundational and illustrates how CNNs work mathematically.

The CNNs is compatible with colour images (3-channel rgb), includes pooling layers (class Pool2D) and works with any given (valid) stride.

neural_networks/activations.py contains basic activation functions, like ReLu or SoftMax with the appropriate forward / backward implementations calculating the jacobian, etc., needed for backpropagation.

Many functions make heavy use of slicing, to speed up the training process significantly. See e.g. Conv2D.forward:

for x in range(out_rows):
    for y in range(out_cols):
        out[:,x,y,:] = np.apply_over_axes(np.sum, W[None]*X_pad[:,x*s:x*s+kernel_height,y*s:y*s+kernel_width,:][...,None], [1,2,3])[:,0,0,0,:]

which is the sliced version of a depth-6 nested for loop -- and thus allows for significant speedup (on my computer, more than 20x speedup for the given training data).

In losses.py, CrossEntropy is the most important function. To allow for speed-up, we simplified mathematically as much as possible, yielding

loss = -1.0/m *np.trace(np.matmul(Y,np.log(Y_hat.T)))

for the forward pass and

-1/m*(np.divide(Y,Y_hat))

for the backward pass.

This is based on a project for CS289 at UC Berkeley.

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

Related tags

Overview

CNN from scratch

Owner

TRIQ implementation

The missing CMake project initializer

End-to-end machine learning project for rices detection

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

TransReID: Transformer-based Object Re-Identification

Pure python PEMDAS expression solver without using built-in eval function

Back to Event Basics: SSL of Image Reconstruction for Event Cameras

Attention Probe: Vision Transformer Distillation in the Wild

Implementations of polygamma, lgamma, and beta functions for PyTorch

A JAX-based research framework for writing differentiable numerical simulators with arbitrary discretizations

PyTorch implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Code for "Hierarchical Skills for Efficient Exploration" HSD-3 Algorithm and Baselines

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

Learn the Deep Learning for Computer Vision in three steps: theory from base to SotA, code in PyTorch, and space-repetition with Anki

A few stylization coreML models that I've trained with CreateML