Comparison of Three Post-tabular Confidentiality Approaches for Survey Weighted Frequency Tables
Natalie Shlomo(a),(*), Thomas Krenzke(b), Jianzhu Li(b)
Transactions on Data Privacy 12:3 (2019) 145 - 168
(a) Social Statistics Department, University of Manchester, Oxford Road, Manchester M13 9PL, United Kingdom.
(b) Westat, 1600 Research Boulevard, Rockville, MD 20850, United States.
e-mail:Natalie.Shlomo @manchester.ac.uk; TomKrenzke @westat.com; JaneLi @westat.com
One of the most common forms of data release by National Statistical Institutes (NSIs) are frequency tables arising from censuses and surveys and these have been the focus of statistical disclosure limitation (SDL) techniques for decades. With the need to modernize dissemination strategies, NSIs are considering web-based flexible table builders where users can generate their own tables of interest without the need for human intervention. This has led to a shift in traditional disclosure risks of concern and a move towards inferential disclosure risk where statistical data can be manipulated and combined with other data sources to reveal sensitive information with a high degree of certainty. To protect against inferential disclosure risk, perturbative methods with more formal privacy guarantees are necessary. We examine three post-tabular confidentiality protection methods of additive random noise that can easily be applied 'on-the-fly' in a flexible table builder for generating survey weighted frequency tables: the computer science approach guaranteeing a formal privacy model called differential privacy and two SDL approaches of post-randomization and a new technique called drop/add-up-to-q. We demonstrate and compare their application in a simulation study based on survey weighted counts in tables.