20 20

Transactions on
Data Privacy
Foundations and Technologies

http://www.tdp.cat


Articles in Press

Accepted articles here

Latest Issues

Year 2017

Volume 10 Issue 1

Year 2016

Volume 9 Issue 3
Volume 9 Issue 2
Volume 9 Issue 1

Year 2015

Volume 8 Issue 3
Volume 8 Issue 2
Volume 8 Issue 1

Year 2014

Volume 7 Issue 3
Volume 7 Issue 2
Volume 7 Issue 1

Year 2013

Volume 6 Issue 3
Volume 6 Issue 2
Volume 6 Issue 1

Year 2012

Volume 5 Issue 3
Volume 5 Issue 2
Volume 5 Issue 1

Year 2011

Volume 4 Issue 3
Volume 4 Issue 2
Volume 4 Issue 1

Year 2010

Volume 3 Issue 3
Volume 3 Issue 2
Volume 3 Issue 1

Year 2009

Volume 2 Issue 3
Volume 2 Issue 2
Volume 2 Issue 1

Year 2008

Volume 1 Issue 3
Volume 1 Issue 2
Volume 1 Issue 1


Volume 3 Issue 1


Random Forests for Generating Partially Synthetic, Categorical Data

Gregory Caiola(a), Jerome P. Reiter(a),(*)

Transactions on Data Privacy 3:1 (2010) 27 - 42

Abstract, PDF

(a) Department of Statistical Science, Duke University, Durham, NC 27708, USA.

e-mail:gregory.caiola @duke.edu; jerry @stat.duke.edu


Abstract

Several national statistical agencies are now releasing partially synthetic, public use microdata. These comprise the units in the original database with sensitive or identifying values replaced with values simulated from statistical models. Specifying synthesis models can be daunting in databases that includemany variables of diverse types. These variablesmay be related inways that can be difficult to capture with standard parametric tools. In this article, we describe how random forests can be adapted to generate partially synthetic data for categorical variables. Using an empirical study, we illustrate that the random forest synthesizer can preserve relationships reasonably well while providing low disclosure risks. The random forest synthesizer has some appealing features for statistical agencies: it can be applied with minimal tuning, easily incorporates numerical, categorical, and mixed variables as predictors, operates efficiently in high dimensions, and automatically fits non-linear relationships.

* Corresponding author.

Follow us




Supports





IIIA-CSIC




ISSN: 1888-5063; ISSN (Digital): 2013-1631; D.L.:B-11873-2008; Web Site: http://www.tdp.cat/
Contact: Transactions on Data Privacy; Vicenç Torra; U. of Skövde; PO Box 408; 54128 Skövde; (Sweden); e-mail:tdp@tdp.cat

 


Vicenç Torra, Last modified: 00 : 25 December 12 2014.