Differentially Private Verification of Regression Predictions from Synthetic Data
Haoyang Yu(a),(*), Jerome P. Reiter(b)
Transactions on Data Privacy 11:3 (2018) 279 - 297
(a) Department of Statistical Science, Box 90251, Duke University, Durham, NC 27708, USA.
(b) Department of Statistical Science, Box 90251, Duke University, Durham, NC 27708, USA.
e-mail:haoyang.yu @duke.edu; jreiter @duke.edu
One approach for releasing public use files is to make synthetic data, i.e., data simulated from statistical models estimated on the confidential data. Given access only to synthetic data, users cannot tell whether the synthetic data have been constructed in ways that provide sufficient accuracy for their particular purposes. To enable users to make such assessments, data providers also can allow users to request verification measures. These are summary statistics reflecting comparisons of the results of analysis based on the synthetic and confidential data. We present three verification measures that satisfy differential privacy for assessing the quality of linear regression models. We use simulation studies to illustrate the verification measures.