Private Data Synthesis from Decentralized Non-IID Data

Few-Shot Private Image Classification in the Cloud

Abstract

Privacy-preserving data sharing enables a wide range of exploratory and secondary data analysis while protecting the privacy of individuals in the dataset. Recent advancements in machine learning, specifically generative adversarial networks (GANs), have shown great promise for synthesizing realistic datasets. In this work, we investigate the feasibility of training GAN models privately in practical settings, where the input data is located across multiple parties, and local data may be highly skewed, i.e., non-IID. We examine centralized private GAN solutions applied at each party locally and propose a federated solution that provides strong privacy and is suitable for non-IID data. We conduct extensive empirical analysis with a wide range of non-IID settings and data from different domains. We provide in-depth discussions about the utility of the synthetic data, the privacy risks in terms of membership inference attacks, as well as the privacyutility trade-off for all solutions.

Type
Publication
Under Review in Artificial Intelligence and Statistics (AISTATS), Spring 2023, Valencia, Spain