20 20

Transactions on
Data Privacy
Foundations and Technologies


Articles in Press

Accepted articles here

Latest Issues

Year 2021

Volume 14 Issue 3
Volume 14 Issue 2
Volume 14 Issue 1

Year 2020

Volume 13 Issue 3
Volume 13 Issue 2
Volume 13 Issue 1

Year 2019

Volume 12 Issue 3
Volume 12 Issue 2
Volume 12 Issue 1

Year 2018

Volume 11 Issue 3
Volume 11 Issue 2
Volume 11 Issue 1

Year 2017

Volume 10 Issue 3
Volume 10 Issue 2
Volume 10 Issue 1

Year 2016

Volume 9 Issue 3
Volume 9 Issue 2
Volume 9 Issue 1

Year 2015

Volume 8 Issue 3
Volume 8 Issue 2
Volume 8 Issue 1

Year 2014

Volume 7 Issue 3
Volume 7 Issue 2
Volume 7 Issue 1

Year 2013

Volume 6 Issue 3
Volume 6 Issue 2
Volume 6 Issue 1

Year 2012

Volume 5 Issue 3
Volume 5 Issue 2
Volume 5 Issue 1

Year 2011

Volume 4 Issue 3
Volume 4 Issue 2
Volume 4 Issue 1

Year 2010

Volume 3 Issue 3
Volume 3 Issue 2
Volume 3 Issue 1

Year 2009

Volume 2 Issue 3
Volume 2 Issue 2
Volume 2 Issue 1

Year 2008

Volume 1 Issue 3
Volume 1 Issue 2
Volume 1 Issue 1

Volume 1 Issue 3

Comparing Fully and Partially Synthetic Datasets for Statistical Disclosure Control in the German IAB Establishment Panel

Jörg Drechsler(a),(*), Stefan Bender(a), Susanne Rässler(b)

Transactions on Data Privacy 1:3 (2008) 105 - 130

Abstract, PDF

(a) Institute for Employment Research (IAB); Regensburger Straße 104; 90478 Nürnberg; Germany; e-mail: joerg.drechsler@iab.de; stefan.bender@iab.de.

(b) Otto-Friedrich-University Bamberg; Department of Statistics and Econometrics; Feldkirchenstraße 21; 96045 Bamberg, Germany; e-mail: susanne.raessler@sowi.uni-bamberg.de.


For datasets considered for public release, statistical agencies have to face the dilemma of guaranteeing the confidentiality of survey respondents on the one hand and offering sufficiently detailed data for scientific use on the other hand. For that reason a variety of methods that address this problem can be found in the literature.

In this paper we discuss the advantages and disadvantages of two approaches that pro-vide disclosure control by generating synthetic datasets: The first, proposed by Rubin [1], generates fully synthetic datasets while the second suggested by Little [2] imputes values only for selected variables that bear a high risk of disclosure. Changing only some variables in general will lead to higher analytical validity. However, the disclosure risk will also increase for partially synthetic data, since true values remain in the data-sets. Thus, agencies willing to release synthetic datasets will have to decide, which of the two methods balances best the trade-off between data utility and disclosure risk for their data. We offer some guidelines to help making this decision.

To our knowledge, the two approaches never haven been empirically compared in the literature so far. We apply the two methods to a set of variables from the 1997 wave of the German IAB Establishment Panel and evaluate their quality by comparing results from the original data with results we achieve for the same analyses run on the datasets after the imputation procedures. The results are as expected: In both cases the analytical validity of the synthetic data is high with partially synthetic datasets outperforming fully synthetic datasets in terms of data utility. But this advantage comes at the price of a higher disclosure risk for the partially synthetic data.

* Corresponding author.

Follow us


ISSN: 1888-5063; ISSN (Digital): 2013-1631; D.L.:B-11873-2008; Web Site: http://www.tdp.cat/
Contact: Transactions on Data Privacy; Vicenç Torra; U. of Skövde; PO Box 408; 54128 Skövde; (Sweden); e-mail:tdp@tdp.cat
Note: TDP's web site does not use cookies. TDP does not keep information neither on IP addresses nor browsers. For the privacy policy access here.


Vicenç Torra, Last modified: 00 : 25 December 12 2014.