This repository is all about spending some time the with the original problem posed by Minsky and Papert

Overview

The Original Problem

Computer Vision has a very interesting history. It's roots really go all the way back to the beginning of computing and Artifical Intelligence. In these early days, it was unknown just how easy or difficult it would be to recreate the function of the human visual system. A great example of this is the 1966 MIT Summer Vision Project. Marvin Minsky and Seymour Papert, co-directors of the MIT AI Labratory, begun the summer with some ambitious goals:

Minsky and Papert assigned Gerald Sussman, an MIT undergraduate studunt as project lead, and setup specific goals for the group around recognizing specific objects in images, and seperating these objects from their backgrounds.

Just how hard is it to acheive the goals Minsky and Papert laid out? How has the field of computer vision advance since that summer? Are these tasks trivial now, 50+ years later? Do we understand how the human visual system works? Just how hard is computer vision and how far have we come?

This Repository

This repository is all about spending some time the with the original problem posed by Minsky and Papert. Working through this problem is a great way to begin learning computer vision.

The repository is broadly divided into two areas: notebooks and a programming challenge. The programming challenge is described in more detail below, and closely follows the goals setup by Minsky and Papert back in 1966. The notebooks are here to give you some help along the way.

Notebooks

Section Notebook Required Reading/Viewing Additional Reading/Viewing Code Developed
1 The Original Problem The Summer Vision Project - -
2 Robert's Cross Only Abstact and Pages 25-27 - Machine perception of 3d solids - convert_to_grayscale, roberts_cross
3 Image Filtering How Blurs & Filters Work - Computerphile - make_gaussian_kernel, filter_2d
4 The Sobel–Feldman Operator Finding the Edges (Sobel Operator) - Computerphile History of Sobel -
5 The Hough Transform [Part 1] Pattern classification Section 9.2.3, Bubble Chamber Video -
6 The Hough Transform [Part 2] How the Hough Transform was Invented Use of the Hough transformation to detect lines and curves in pictures. HoughAccumulator

Viewing Notebooks

The links in the table above take you to externally hosted HTML exports of the notebooks. This works pretty well, except html won't render embedded slide shows unfortunately. The best way to view the notebooks is to clone this repo and run them yourself! Checkout the setup instructions below.

Animations

The notebooks in this repository make frequent use of gif animations. These files are pretty large, so we don't store them on github, and they unfortunately won't show up when viewing the notebooks via github. The ideal way to view the notebooks is to clone the repo, download the videos, and use the recommended jupyterthemes below. Instructions on downloading videos are below.

Note on Launching the Jupyter Notebooks

To properly view the images and animations, please launch your jupyter notebook from the root directory of this repository.

Programming Challenge

Instructions

  • Write a method classify.py that takes in an image and returns a prediction - ball, brick, or cylinder.
  • An example script in located in challenge/sample_student.py
  • Your script will be automatically evaluated on a set of test images.
  • The testing images are quite similar to the training images, and organized into the same difficulty categories.
  • You are allowed 10 submissions to the evaluation server, which will provide immediate feedback.

The Data

Easy Examples

Grading

Following the progression set out the MIT the summer project, we'll start with easy images, and move to more difficult image with more complex backgrounds as we progress. For each difficulty level, we will compute the average accuracy of your classifier. We will then compute an average overall accuracy, weighting easier examples more:

overall_accuracy = 0.5*accuracy_easy 
                 + 0.2*accuracy_medium_1 
                 + 0.2*accuracy_medium_2 
                 + 0.1*accuracy_hard 
Overall Accuracy Points
>= 0.6 10/10
0.55 <= a < 0.6 9/10
0.5 <= a < 0.55 8/10
0.45 <= a < 0.5 7/10
0.40 <= a < 0.45 6/10
0.35 <= a < 0.40 5/10
a < 0.35 4/10
Non-running code 0/10

A quick note on difficulty

Depending on your background, this challenge may feel a bit like getting thrown into the deep end. If it feels a bit daunting - that's ok! Half of the purpose of this assignement is to help you develop an appreciation for why computer vision is so hard. As you may have already guessed, Misky, Sussman, and Papert did not reach their summer goals - and I'm not expecting you to either. The grading table above reflects this - for example, if you're able to get 90% accuracy on the easy examples, and simply guess randomly on the rest of the examples, you'll earn 10/10 points.

Setup

The Python 3 Anaconda Distribution is the easiest way to get going with the notebooks and code presented here.

(Optional) You may want to create a virtual environment for this repository:

conda create -n cv python=3 
source activate cv

You'll need to install the jupyter notebook to run the notebooks:

conda install jupyter

# You may also want to install nb_conda (Enables some nice things like change virtual environments within the notebook)
conda install nb_conda

This repository requires the installation of a few extra packages, you can install them all at once with:

pip install -r requirements.txt

(Optional) jupyterthemes can be nice when presenting notebooks, as it offers some cleaner visual themes than the stock notebook, and makes it easy to adjust the default font size for code, markdown, etc. You can install with pip:

pip install jupyterthemes

Recommend jupyter them for presenting these notebook (type into terminal before launching notebook):

jt -t grade3 -cellw=90% -fs=20 -tfs=20 -ofs=20 -dfs=20

Recommend jupyter them for viewing these notebook (type into terminal before launching notebook):

jt -t grade3 -cellw=90% -fs=14 -tfs=14 -ofs=14 -dfs=14

Downloading Data

For larger files such as data and videos, I've provided download scripts to download these files from welchlabs.io. These files can be pretty big, so you may want to grab a cup of your favorite beverage to enjoy while downloading. The script can be run from within the jupyter notebooks or from the terminal:

python util/get_and_unpack.py -url http://www.welchlabs.io/unccv/the_original_problem/data/data.zip

Alternatively, you can download download data manually, unzip and place in this directory.

Downloading Videos

Run the script below or call it from the notebooks:

python util/get_and_unpack.py -url http://www.welchlabs.io/unccv/the_original_problem/videos.zip

Alternatively, you can download download videos manually, unzip and place in this directory.

Owner
Jaissruti Nanthakumar
Master's in Computer Science | University of North Carolina at Charlotte
Jaissruti Nanthakumar
Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22) Paper Link | Project Page Abstract : Manual an

Mohamed Afham 152 Dec 23, 2022
Invariant Causal Prediction for Block MDPs

MISA Abstract Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challeng

Meta Research 41 Sep 17, 2022
This implementation contains the application of GPlearn's symbolic transformer on a commodity futures sector of the financial market.

GPlearn_finiance_stock_futures_extension This implementation contains the application of GPlearn's symbolic transformer on a commodity futures sector

Chengwei <a href=[email protected]"> 189 Dec 25, 2022
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

ABME (ICCV2021) Junheum Park, Chul Lee, and Chang-Su Kim Official PyTorch Code for "Asymmetric Bilateral Motion Estimation for Video Frame Interpolati

Junheum Park 86 Dec 28, 2022
TargetAllDomainObjects - A python wrapper to run a command on against all users/computers/DCs of a Windows Domain

TargetAllDomainObjects A python wrapper to run a command on against all users/co

Podalirius 19 Dec 13, 2022
Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

DE-DETRs By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao This repository is an official implementation of DE-DETR and DELA-DETR in

Wen Wang 61 Dec 12, 2022
Datasets, Transforms and Models specific to Computer Vision

torchvision The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. Installat

13.1k Jan 02, 2023
Open-source Monocular Python HawkEye for Tennis

Tennis Tracking šŸŽ¾ Objectives Track the ball Detect court lines Detect the players To track the ball we used TrackNet - deep learning network for trac

ArtLabs 188 Jan 08, 2023
Simple renderer for use with MuJoCo (>=2.1.2) Python Bindings.

Viewer for MuJoCo in Python Interactive renderer to use with the official Python bindings for MuJoCo. Starting with version 2.1.2, MuJoCo comes with n

Rohan P. Singh 62 Dec 30, 2022
Final project for Intro to CS class.

Financial Analysis Web App https://share.streamlit.io/mayurk1/fin-web-app-final-project/webApp.py 1. Project Description This project is a technical a

Mayur Khanna 1 Dec 10, 2021
MOpt-AFL provided by the paper "MOPT: Optimized Mutation Scheduling for Fuzzers"

MOpt-AFL 1. Description MOpt-AFL is a AFL-based fuzzer that utilizes a customized Particle Swarm Optimization (PSO) algorithm to find the optimal sele

172 Dec 18, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

64 Jan 05, 2023
Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Informative-tracking-benchmark Informative tracking benchmark (ITB) higher diversity. It contains 9 representative scenarios and 180 diverse videos. m

Xin Li 15 Nov 26, 2022
GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily

GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily Abstract Graph Neural Networks (GNNs) are widely used on a

10 Dec 20, 2022
An easier way to build neural search on the cloud

An easier way to build neural search on the cloud Jina is a deep learning-powered search framework for building cross-/multi-modal search systems (e.g

Jina AI 17k Jan 02, 2023
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

Yazhou XING 90 Oct 19, 2022
ArcaneGAN by Alex Spirin

ArcaneGAN by Alex Spirin

Alex 617 Dec 28, 2022
Quantum-enhanced transformer neural network

Example of a Quantum-enhanced transformer neural network Get the code: git clone https://github.com/rdisipio/qtransformer.git cd qtransformer Create

Riccardo Di Sipio 61 Nov 08, 2022
A Pythonic library for Nvidia Codec.

A Pythonic library for Nvidia Codec. The project is still in active development; expect breaking changes. Why another Python library for Nvidia Codec?

Zesen Qian 12 Dec 27, 2022
CM building dataset Timisoara

CM_building_dataset_Timisoara Date created: Febr-2020 The Timi\c{s}oara Building Dataset - TMBuD - is composed of 160 images with the resolution of 76

Orhei Ciprian 5 Sep 07, 2022