Ensuring the privacy of training data is a growing concern since many machine learning models are trained on confidential and potentially sensitive data. Much attention has been devoted to methods for protecting individual privacy during analyses of large datasets. However in many settings, global properties of the dataset may also be sensitive (e.g., mortality rate in a hospital rather than presence of a particular patient in the dataset). In this work, we depart from individual privacy to initiate the study of attribute privacy, where a data owner is concerned about revealing sensitive properties of a whole dataset during analysis. We propose definitions to capture attribute privacy in two relevant cases where global attributes may need to be protected: (1) properties of a specific dataset and (2) parameters of the underlying distribution from which dataset is sampled. We also provide two efficient mechanisms for specific data distributions and one general but inefficient mechanism that satisfy attribute privacy for these settings. We base our results on a novel and non-trivial use of the Pufferfish framework to account for correlations across attributes in the data, thus addressing “the challenging problem of developing Pufferfish instantiations and algorithms for general aggregate secrets” that was left open by Kifer and Machanavajjhala in 2014

Rachel Cummings, Olga Ohrimenko, and Wangrong Zhang
Journal Article
Publication Date

Full Citation

Cummings, Rachel, Olga Ohrimenko, and Wangrong Zhang
. “Attribute Privacy: Framework and Mechanisms.” (June 20, 2022).