LIVECell - A large-scale dataset for label-free live cell segmentation

Related tags

Deep LearningLIVECell
Overview

LIVECell dataset

This document contains instructions of how to access the data associated with the submitted manuscript "LIVECell - A large-scale dataset for label-free live cell segmentation" by Edlund et. al. 2021.

Background

Light microscopy is a cheap, accessible, non-invasive modality that when combined with well-established protocols of two-dimensional cell culture facilitates high-throughput quantitative imaging to study biological phenomena. Accurate segmentation of individual cells enables exploration of complex biological questions, but this requires sophisticated imaging processing pipelines due to the low contrast and high object density. Deep learning-based methods are considered state-of-the-art for most computer vision problems but require vast amounts of annotated data, for which there is no suitable resource available in the field of label-free cellular imaging. To address this gap we present LIVECell, a high-quality, manually annotated and expert-validated dataset that is the largest of its kind to date, consisting of over 1.6 million cells from a diverse set of cell morphologies and culture densities. To further demonstrate its utility, we provide convolutional neural network-based models trained and evaluated on LIVECell.

How to access LIVECell

All images in LIVECell are available following this link (requires 1.3 GB). Annotations for the different experiments are linked below. To see a more details regarding benchmarks and how to use our models, see this link.

LIVECell-wide train and evaluate

Annotation set URL
Training set link
Validation set link
Test set link

Single cell-type experiments

Cell Type Training set Validation set Test set
A172 link link link
BT474 link link link
BV-2 link link link
Huh7 link link link
MCF7 link link link
SH-SHY5Y link link link
SkBr3 link link link
SK-OV-3 link link link

Dataset size experiments

Split URL
2 % link
4 % link
5 % link
25 % link
50 % link

Comparison to fluorescence-based object counts

The images and corresponding json-file with object count per image is available together with the raw fluorescent images the counts is based on.

Cell Type Images Counts Fluorescent images
A549 link link link
A172 link link link

Download all of LIVECell

The LIVECell-dataset and trained models is stored in an Amazon Web Services (AWS) S3-bucket. It is easiest to download the dataset if you have an AWS IAM-user using the AWS-CLI in the folder you would like to download the dataset to by simply:

aws s3 sync s3://livecell-dataset .

If you do not have an AWS IAM-user, the procedure is a little bit more involved. We can use curl to make an HTTP-request to get the S3 XML-response and save to files.xml:

files.xml ">
curl -H "GET /?list-type=2 HTTP/1.1" \
     -H "Host: livecell-dataset.s3.eu-central-1.amazonaws.com" \
     -H "Date: 20161025T124500Z" \
     -H "Content-Type: text/plain" http://livecell-dataset.s3.eu-central-1.amazonaws.com/ > files.xml

We then get the urls from files using grep:

)[^<]+" files.xml | sed -e 's/^/http:\/\/livecell-dataset.s3.eu-central-1.amazonaws.com\//' > urls.txt ">
grep -oPm1 "(?<=
   
    )[^<]+" files.xml | sed -e 's/^/http:\/\/livecell-dataset.s3.eu-central-1.amazonaws.com\//' > urls.txt

   

Then download the files you like using wget.

File structure

The top-level structure of the files is arranged like:

/livecell-dataset/
    ├── LIVECell_dataset_2021  
    |       ├── annotations/
    |       ├── models/
    |       ├── nuclear_count_benchmark/	
    |       └── images.zip  
    ├── README.md  
    └── LICENSE

LIVECell_dataset_2021/images

The images of the LIVECell-dataset are stored in /livecell-dataset/LIVECell_dataset_2021/images.zip along with their annotations in /livecell-dataset/LIVECell_dataset_2021/annotations/.

Within images.zip are the training/validation-set and test-set images are completely separate to facilitate fair comparison between studies. The images require 1.3 GB disk space unzipped and are arranged like:

images/
    ├── livecell_test_images
    |       └── 
   
    
    |               └── 
    
     _Phase_
     
      _
      
       _
       
        _
        
         .tif └── livecell_train_val_images └── 
          
         
        
       
      
     
    
   

Where is each of the eight cell-types in LIVECell (A172, BT474, BV2, Huh7, MCF7, SHSY5Y, SkBr3, SKOV3). Wells are the location in the 96-well plate used to culture cells, indicates location in the well where the image was acquired, the time passed since the beginning of the experiment to image acquisition and index of the crop of the original larger image. An example image name is A172_Phase_C7_1_02d16h00m_2.tif, which is an image of A172-cells, grown in well C7 where the image is acquired in position 1 two days and 16 hours after experiment start (crop position 2).

LIVECell_dataset_2021/annotations/

The annotations of LIVECell are prepared for all tasks along with the training/validation/test splits used for all experiments in the paper. The annotations require 2.1 GB of disk space and are arranged like:

annotations/
    ├── LIVECell
    |       └── livecell_coco_
   
    .json
    ├── LIVECell_single_cells
    |       └── 
    
     
    |               └── 
     
      .json
    └── LIVECell_dataset_size_split
            └── 
      
       _train
       
        percent.json 
       
      
     
    
   
  • annotations/LIVECell contains the annotations used for the LIVECell-wide train and evaluate task.
  • annotations/LIVECell_single_cells contains the annotations used for Single cell type train and evaluate as well as the Single cell type transferability tasks.
  • annotations/LIVECell_dataset_size_split contains the annotations used to investigate the impact of training set scale.

All annotations are in Microsoft COCO Object Detection-format, and can for instance be parsed by the Python package pycocotools.

models/

ALL models trained and evaluated for tasks associated with LIVECell are made available for wider use. The models are trained using detectron2, Facebook's framework for object detection and instance segmentation. The models require 15 GB of disk space and are arranged like:

models/
   └── Anchor_
   
    
            ├── ALL/
            |    └──
    
     .pth
            └── 
     
      /
                 └──
      
       .pths
       

      
     
    
   

Where each .pth is a binary file containing the model weights.

configs/

The config files for each model can be found in the LIVECell github repo

LIVECell
    └── Anchor_
   
    
            ├── livecell_config.yaml
            ├── a172_config.yaml
            ├── bt474_config.yaml
            ├── bv2_config.yaml
            ├── huh7_config.yaml
            ├── mcf7_config.yaml
            ├── shsy5y_config.yaml
            ├── skbr3_config.yaml
            └── skov3_config.yaml

   

Where each config file can be used to reproduce the training done or in combination with our model weights for usage, for more info see the usage section.

nuclear_count_benchmark/

The images and fluorescence-based object counts are stored as the label-free images in a zip-archive and the corresponding counts in a json as below:

nuclear_count_benchmark/
    ├── A172.zip
    ├── A172_counts.json
    ├── A172_fluorescent_images.zip
    ├── A549.zip
    ├── A549_counts.json 
    └── A549_fluorescent_images.zip

The json files are on the following format:

": " " } ">
{
    "
     
      ": "
      
       "
}

      
     

Where points to one of the images in the zip-archive, and refers to the object count according fluorescent nuclear labels.

LICENSE

All images, annotations and models associated with LIVECell are published under Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

All software source code associated associated with LIVECell are published under the MIT License.

Owner
Sartorius Corporate Research
Sartorius Corporate Research
Unsupervised Representation Learning by Invariance Propagation

Unsupervised Learning by Invariance Propagation This repository is the official implementation of Unsupervised Learning by Invariance Propagation. Pre

FengWang 15 Jul 06, 2022
Code, pre-trained models and saliency results for the paper "Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images".

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB This repository is the official implementation of the paper. Our results comming soon in

Xiaoqiang Wang 8 May 22, 2022
Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

Robert Martin 1.3k Dec 29, 2022
Faster RCNN with PyTorch

Faster RCNN with PyTorch Note: I re-implemented faster rcnn in this project when I started learning PyTorch. Then I use PyTorch in all of my projects.

Long Chen 1.6k Dec 23, 2022
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

61 Jan 07, 2023
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms This repo contains the source code to reproduce the results in the paper A Close

Costa Huang 73 Dec 24, 2022
A simple Python library for stochastic graphical ecological models

What is Viridicle? Viridicle is a library for simulating stochastic graphical ecological models. It implements the continuous time models described in

Theorem Engine 0 Dec 04, 2021
Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Cross-Quality Labeled Faces in the Wild (XQLFW) Here, we release the database, evaluation protocol and code for the following paper: Cross Quality LFW

Martin Knoche 10 Dec 12, 2022
Face-Recognition-Attendence-System - This face recognition Attendence system using Python

Face-Recognition-Attendence-System I have developed this face recognition Attend

Riya Gupta 4 May 10, 2022
A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models

This repository contains a collection of resources and papers on Diffusion Models and Score-based Models. If there are any missing valuable resources

5.1k Jan 08, 2023
You Only 👀 One Sequence

You Only 👀 One Sequence TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO obje

Hust Visual Learning Team 666 Jan 03, 2023
This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds This repository is a PyTorch implementation for paper: Uns

Kaizhi Yang 42 Dec 09, 2022
GULAG: GUessing LAnGuages with neural networks

GULAG: GUessing LAnGuages with neural networks Classify languages in text via neural networks. Привет! My name is Egor. Was für ein herrliches Frühl

Egor Spirin 12 Sep 02, 2022
We provided a matlab implementation for an evolutionary multitasking AUC optimization framework (EMTAUC).

EMTAUC We provided a matlab implementation for an evolutionary multitasking AUC optimization framework (EMTAUC). In this code, SBGA is considered a ba

7 Nov 24, 2022
DTCN SMP Challenge - Sequential prediction learning framework and algorithm

DTCN This is the implementation of our paper "Sequential Prediction of Social Me

Bobby 2 Jan 24, 2022
This is the repo for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement".

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement This is the repository for the paper "Improving the Accuracy-Memory Trad

3 Dec 29, 2022
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
Hyperparameter tuning for humans

KerasTuner KerasTuner is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search. Easily c

Keras 2.6k Dec 27, 2022
Repo for FUZE project. I will also publish some Linux kernel LPE exploits for various real world kernel vulnerabilities here. the samples are uploaded for education purposes for red and blue teams.

Linux_kernel_exploits Some Linux kernel exploits for various real world kernel vulnerabilities here. More exploits are yet to come. This repo contains

Wei Wu 472 Dec 21, 2022
ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing This repository contains code for the ICLR 2021 paper "SCoRE: Pre-Tr

Microsoft 28 Oct 02, 2022