[NeurIPS 2021] "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators"

Last update: Oct 12, 2022

Overview

G-PATE

This is the official code base for our NeurIPS 2021 paper:

"G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators."

Yunhui Long*, Boxin Wang*, Zhuolin Yang, Bhavya Kailkhura, Aston Zhang, Carl A. Gunter, Bo Li

Citation

@article{long2021gpate,
  title={G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators},
  author={Long, Yunhui and Wang, Boxin and Yang, Zhuolin and Kailkhura, Bhavya and Zhang, Aston and Gunter, Carl A. and Li, Bo},
  journal={NeurIPS 2021},
  year={2021}
}

Usage

Prepare your environment

Download required packages

pip install -r requirements.txt

Prepare your data

Please store the training data in $data_dir. By default, $data_dir is set to ../../data.

We provide a script to download the MNIST and Fashion Mnist datasets.

python download.py [dataset_name]

For MNIST, you can run

python download.py mnist

For Fashion-MNIST, you can run

python download.py fashion_mnist

For CelebA datasets, please refer to their official websites for downloading.

Training

python main.py --checkpoint_dir [checkpoint_dir] --dataset [dataset_name] --train

Example of one of our best commands on MNIST:

Given eps=1,

python main.py --checkpoint_dir mnist_teacher_4000_z_dim_50_c_1e-4/ --teachers_batch 40 --batch_teachers 100 --dataset mnist --train --sigma_thresh 3000 --sigma 1000 --step_size 1e-4 --max_eps 1 --nopretrain --z_dim 50 --batch_size 64

By default, after it reaches the max epsilon=1, it will generate 100,000 DP samples as eps-1.00.data.pkl in checkpoint_dir.

Given eps=10,

python main.py --checkpoint_dir mnist_teacher_2000_z_dim_100_eps_10/ --teachers_batch 40 --batch_teachers 50 --dataset mnist --train --sigma_thresh 600 --sigma 100 --step_size 1e-4 --max_eps 10 --nopretrain --z_dim 100 --batch_size 64

By default, after it reaches the max epsilon=10, it will generate 100,000 DP samples as eps-9.9x.data.pkl in checkpoint_dir.

Generating synthetic samples

python main.py --checkpoint_dir [checkpoint_dir] --dataset [dataset_name]

Evaluate the synthetic records

We follow the standard the protocl and train a classifier on synthetic samples and test it on real samples.

For MNIST,

python evaluation/train-classifier-mnist.py --data [DP_data_dir]

For Fashion-MNIST,

python evaluation/train-classifier-fmnist.py --data [DP_data_dir]

For CelebA-Gender,

python evaluation/train-classifier-celebA.py --data [DP_data_dir]

For CelebA-Gender (Small),

python evaluation/train-classifier-small-celebA.py --data [DP_data_dir]

For CelebA-Hair,

python evaluation/train-classifier-hair.py --data [DP_data_dir]

The [DP_data_dir] is where your generated DP samples are located.

In the MNIST example above, we have generated DP samples in $checkpoint_dir/eps-1.00.data.

During evaluation, you should run with DP_data_dir=$checkpoint_dir/eps-1.00.data.

python evaluation/train-classifier-mnist.py --data $checkpoint_dir/eps-1.00.data

[NeurIPS 2021] "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators"

Related tags

Overview

G-PATE

Citation

Usage

Prepare your environment

Prepare your data

Training

Generating synthetic samples

Evaluate the synthetic records

Owner

AI Secure

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

A denoising diffusion probabilistic model synthesises galaxies that are qualitatively and physically indistinguishable from the real thing.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Py-FEAT: Python Facial Expression Analysis Toolbox

A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

Neural style transfer as a class in PyTorch

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

Code release for NeuS

Keras Realtime Multi-Person Pose Estimation - Keras version of Realtime Multi-Person Pose Estimation project

Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

🗣️ Microsoft Edge TTS for Home Assistant, no need for app_key

A Closer Look at Structured Pruning for Neural Network Compression

Vehicle Detection Using Deep Learning and YOLO Algorithm

Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies

Differentiable molecular simulation of proteins with a coarse-grained potential

Framework for Spectral Clustering on the Sparse Coefficients of Learned Dictionaries

Extremely simple and fast extreme multi-class and multi-label classifiers.

Reproducing code of hair style replacement method from Barbershorp.

An example to implement a new backbone with OpenMMLab framework.