Classifies galaxy morphology with Bayesian CNN

Related tags

Deep Learningzoobot
Overview

Zoobot

Documentation Status

Zoobot classifies galaxy morphology with deep learning. This code will let you:

  • Reproduce and improve the Galaxy Zoo DECaLS automated classifications
  • Finetune the classifier for new tasks

For example, you can train a new classifier like so:

model = define_model.get_model(
    output_dim=len(schema.label_cols),  # schema defines the questions and answers
    input_size=initial_size, 
    crop_size=int(initial_size * 0.75),
    resize_size=resize_size
)

model.compile(
    loss=losses.get_multiquestion_loss(schema.question_index_groups),
    optimizer=tf.keras.optimizers.Adam()
)

training_config.train_estimator(
    model, 
    train_config,  # parameters for how to train e.g. epochs, patience
    train_dataset,
    test_dataset
)

Install using git and pip: git clone [email protected]:mwalmsley/zoobot.git pip install -r zoobot/requirements.txt (virtual env or conda highly recommended) pip install -e zoobot The main branch is for stable-ish releases. The dev branch includes the shiniest features but may change at any time.

To get started, see the documentation.

I also include some working examples for you to copy and adapt:

Latest cool features on dev branch (June 2021):

  • Multi-GPU distributed training
  • Support for Weights and Biases (wandb)
  • Worked examples for custom representations

Contributions are welcome and will be credited in any future work.

If you use this repo for your research, please cite the paper.

Comments
  • Benchmarks

    Benchmarks

    It's important that Zoobot has proper benchmarks so that we can be confident new releases work properly for users. This PR adds those benchmarks.

    In the course of setting up the benchmarks, I have made some major changes/improvements:

    • pytorch-galaxy-datasets refactored to work for tensorflow, imports adapted
    • both tensorflow and pytorch zoobot versions use albumentations for augmentations. Old TF code removed.
    • tensorflow version bumped to 2.10 (current latest) while I'm at it
    • pytorch version now has logging for per-question loss. Loss func aggregation has new option to support this.
    • TensorFlow version has per-question logging also, but awaiting issue with Keras team to enable
    • Created minimal_example.py for TensorFlow (thanks, @katgre )
    • Support CPU-only PyTorch training
    • Refactor TF TrainingConfig to Trainer object, Lightning style, for consistency
    enhancement 
    opened by mwalmsley 3
  • on_train_batch_end is slow in TF

    on_train_batch_end is slow in TF

    Unclear what's causing this slowness. Presumably a callback I added - but none look like they should be heavy? Perhaps something wandb is doing?

    • Remove all callbacks and rerun
    • Remove wandb and rerun For each, check if slow warning continues (or if training speed changes at all)
    enhancement 
    opened by mwalmsley 3
  • add gh action to publish package to pypi

    add gh action to publish package to pypi

    Related to https://github.com/mwalmsley/zoobot/issues/18#issuecomment-1278635788

    This PR adds an auto CI release mechanism for publishing zoobot to pypi. It uses the GH action to release to pypi https://github.com/pypa/gh-action-pypi-publish

    opened by camallen 3
  • Publish latest version to PyPi?

    Publish latest version to PyPi?

    A question rather than a request. Are there any plans to publish the refactored work ?

    PyPi shows v0.0.1 is published https://pypi.org/project/zoobot/#history on 15th March 2021 but the latest code is ~v0.0.3 (tags) and the refactor seems to be working well.

    Ideally I can pull in these packages to my own env / container and then train with the latest code vs pulling in from github etc.

    opened by camallen 3
  • setup branch protection rules on 'main'

    setup branch protection rules on 'main'

    https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/managing-a-branch-protection-rule

    It may be too restrictive for your use case / dev flows but we use this for contributor PRs etc. Basically we ensure that a PR meets certain criteria in terms of our CI runs, can only merge a PR once one of the CI runs v3.7 or v3.9 tests pass.

    Feel free to close if you don't think this is useful.

    enhancement 
    opened by camallen 2
  • Deprecate TFRecords

    Deprecate TFRecords

    TFRecords are cumbersome and take up a lot of disk space. It's much simpler to learn directly from images on disk, at the cost of some I/O performance.

    This PR removes support for TFRecords in favour of images-on-disk. This will ultimately enable new TensorFlow weights trained on all of DESI (impractical with TFRecords).

    Breaking change for anyone using TFRecords (i.e. everyone using TensorFlow to train from scratch). Finetuning should not be affected.

    TODO - will require new greyscale/colour pretrained models, just for safety.

    opened by mwalmsley 2
  • feat(CI): Add proposed python CI GH Action

    feat(CI): Add proposed python CI GH Action

    This PR proposes to add a simple GH Action script that establishes a python environment, downloads the requirements and runs pytest.

    Some other things to consider might be to use conda for virtual environments and creating CI scripts for Docker as well.

    opened by SauravMaheshkar 2
  • Improve data files for docker

    Improve data files for docker

    This PR changes the docker / compose setup, specifically it

    • consolidates the docker files to cuda and tensorflow base images (no need for a python base image)
    • adds a .dockerignore entry for all data files when building the container to keep the size down
    • and provides an easy way to inject them at run time via local directory mounts in the compose file
    • finally this removes specific to my machine local directory setup for injecting unrelated data files
    opened by camallen 2
  • add wandb logging, freeze batchnorm by default

    add wandb logging, freeze batchnorm by default

    Doing some polishing on finetuning

    • Add wandb logging to the full_tree example. @camallen use this for dashboard. You will need to add import wandb, wandb.init(authkey, etc) just before when running on Azure.
    • Freeze batch norm layers by default when finetuning, with new recursive function
    • Pass additional params via config (thanks Cam)
    • Minor cleanup
    opened by mwalmsley 1
  • Add PyTorch Finetuning Capability, Examples

    Add PyTorch Finetuning Capability, Examples

    Key change is adding pytorch.training.finetune() method. Works on either classification (e.g. 0, 1) data or count (e.g. 12 said yes, 4 said no) data.

    Includes three working examples:

    • Binary classification, with tiny rings subset
    • Counts for single question, with full internal rings data
    • Counts for all questions, with GZ Cosmic Dawn schema

    Also updates various imports for the galaxy-datasets refactor, fixes prediction method to work on unlabelled data, minor QoL improvements.

    Finally, changes PyTorch dense layer initialisation to custom high-uncertainty initialisation - see efficientnet_custom.py

    cc @camallen

    opened by mwalmsley 1
  • Add v0.02 changes

    Add v0.02 changes

    Adds support (minimal working examples, a guide) for calculating new representations with a trained model.

    Also adds significant new features:

    • Distributed training with several GPUs
    • Metric logging with Weights&Biases (add your own login credentials)
    • Train on color (3-band) images, not just greyscale

    Also adds a critical bugfix (when loading images for direct predictions i.e. not via TFRecords, correctly normalise to the 0-1 interval expected (without documentation) by the tf.keras.experimental.preprocessing layers).

    Also adds misc. minor fixes and documentation tweaks.

    This code was used for the morphology tools paper (to be submitted shortly).

    opened by mwalmsley 1
  • Avoid --extra-index-url via dependency_links

    Avoid --extra-index-url via dependency_links

    It should be possible to search for non-standard package repositories using just setup.py, without having the user also set --extra-index-url.

    https://setuptools.pypa.io/en/latest/deprecated/dependency_links.html

    But I couldn't get this to work on a quick try.

    enhancement help wanted 
    opened by mwalmsley 1
  • Can't import finetune while going through finetune_binary_classification.py

    Can't import finetune while going through finetune_binary_classification.py

    I tried to go through finetune_binary_classification.py, but got the error:

    ImportError: cannot import name 'finetune' from 'zoobot.pytorch.training' (/usr/local/lib/python3.8/dist-packages/zoobot/pytorch/training/init.py)

    I tried it both with kasia and dev branch, went through "git clone" and "pip install" (I remembered there were some issues during Hackaton regarding that), also tried to import other features from the folder (i.e. losses) and it worked fine.

    bug 
    opened by katgre 2
  • Create a simple decision tree in minimal_example.py

    Create a simple decision tree in minimal_example.py

    Instead of using on of the complicated decision trees from decals dr5, let's create a simple decision tree with one dependency already written in the minimal_example.py.

    opened by katgre 0
Releases(v0.0.3)
  • v0.0.3(Apr 25, 2022)

    Improved documentation and refactored train API (pytorch).

    Awaiting results from several segmentation experiments ahead of public release (inc pytorch version).

    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Oct 4, 2021)

  • beta(Sep 29, 2021)

    Initial release.

    This had enough documentation and code to replicate the DECaLS model and make predictions. There are a few minor missing arguments and similar typos that you might have stumbled into, because I made some last minute changes without updating the docs, but everything worked with a little stack tracing.

    Source code(tar.gz)
    Source code(zip)
Owner
Mike Walmsley
Mike Walmsley
A Comprehensive Study on Learning-Based PE Malware Family Classification Methods

A Comprehensive Study on Learning-Based PE Malware Family Classification Methods Datasets Because of copyright issues, both the MalwareBazaar dataset

8 Oct 21, 2022
CTC segmentation python package

CTC segmentation CTC segmentation can be used to find utterances alignments within large audio files. This repository contains the ctc-segmentation py

Ludwig Kürzinger 217 Jan 04, 2023
Template repository for managing machine learning research projects built with PyTorch-Lightning

Tutorial Repository with a minimal example for showing how to deploy training across various compute infrastructure.

Sidd Karamcheti 3 Feb 11, 2022
Deep functional residue identification

DeepFRI Deep functional residue identification Citing @article {Gligorijevic2019, author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Koscio

Flatiron Institute 156 Dec 25, 2022
Implementation of our paper 'RESA: Recurrent Feature-Shift Aggregator for Lane Detection' in AAAI2021.

RESA PyTorch implementation of the paper "RESA: Recurrent Feature-Shift Aggregator for Lane Detection". Our paper has been accepted by AAAI2021. Intro

137 Jan 02, 2023
A PyTorch-based Semi-Supervised Learning (SSL) Codebase for Pixel-wise (Pixel) Vision Tasks

PixelSSL is a PyTorch-based semi-supervised learning (SSL) codebase for pixel-wise (Pixel) vision tasks. The purpose of this project is to promote the

Zhanghan Ke 255 Dec 11, 2022
Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

Visual Parser (ViP) This is the official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers. Key Feature

Shuyang Sun 117 Dec 11, 2022
Deep Learning Based Fasion Recommendation System for Ecommerce

Project Name: Fasion Recommendation System for Ecommerce A Deep learning based streamlit web app which can recommened you various types of fasion prod

BAPPY AHMED 13 Dec 13, 2022
Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs

Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs In this work, we propose an algorithm DP-SCAFFOLD(-warm), whic

19 Nov 10, 2022
Count the MACs / FLOPs of your PyTorch model.

THOP: PyTorch-OpCounter How to install pip install thop (now continously intergrated on Github actions) OR pip install --upgrade git+https://github.co

Ligeng Zhu 3.9k Dec 29, 2022
Camera-caps - Examine the camera capabilities for V4l2 cameras

camera-caps This is a graphical user interface over the v4l2-ctl command line to

Jetsonhacks 25 Dec 26, 2022
Perturb-and-max-product: Sampling and learning in discrete energy-based models

Perturb-and-max-product: Sampling and learning in discrete energy-based models This repo contains code for reproducing the results in the paper Pertur

Vicarious 2 Mar 14, 2022
Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

Cascading Feature Extraction for Fast Point Cloud Registration This repository contains the source code for the paper [Arxive link comming soon]. Meth

7 May 26, 2022
Code for paper "Learning to Reweight Examples for Robust Deep Learning"

learning-to-reweight-examples Code for paper Learning to Reweight Examples for Robust Deep Learning. [arxiv] Environment We tested the code on tensorf

Uber Research 261 Jan 01, 2023
A small fun project using python OpenCV, mediapipe, and pydirectinput

Here I tried a small fun project using python OpenCV, mediapipe, and pydirectinput. Here we can control moves car game when yellow color come to right box (press key 'd') left box (press key 'a') lef

Sameh Elisha 3 Nov 17, 2022
Audio Visual Emotion Recognition using TDA

Audio Visual Emotion Recognition using TDA RAVDESS database with two datasets analyzed: Video and Audio dataset: Audio-Dataset: https://www.kaggle.com

Combinatorial Image Analysis research group 3 May 11, 2022
Finetune SSL models for MOS prediction

Finetune SSL models for MOS prediction This is code for our paper under review for ICASSP 2022: "Generalization Ability of MOS Prediction Networks" Er

Yamagishi and Echizen Laboratories, National Institute of Informatics 32 Nov 22, 2022
Waymo motion prediction challenge 2021: 3rd place solution

Waymo motion prediction challenge 2021: 3rd place solution 📜 Technical report 🗨️ Presentation 🎉 Announcement 🛆Motion Prediction Channel Website 🛆

158 Jan 08, 2023
Source-to-Source Debuggable Derivatives in Pure Python

Tangent Tangent is a new, free, and open-source Python library for automatic differentiation. Existing libraries implement automatic differentiation b

Google 2.2k Jan 01, 2023
Personal project about genus-0 meshes, spherical harmonics and a cow

How to transform a cow into spherical harmonics ? Spot the cow, from Keenan Crane's blog Context In the field of Deep Learning, training on images or

3 Aug 22, 2022