Transactions on
Data Privacy
Foundations and Technologies
http://www.tdp.cat

Articles in Press

Accepted articles here

Latest Issues

Year 2026

Volume 19 Issue 2
Volume 19 Issue 1

Year 2025

Volume 18 Issue 3
Volume 18 Issue 2
Volume 18 Issue 1

Year 2024

Volume 17 Issue 3
Volume 17 Issue 2
Volume 17 Issue 1

Year 2023

Volume 16 Issue 3
Volume 16 Issue 2
Volume 16 Issue 1

Year 2022

Volume 15 Issue 3
Volume 15 Issue 2
Volume 15 Issue 1

Year 2021

Volume 14 Issue 3
Volume 14 Issue 2
Volume 14 Issue 1

Year 2020

Volume 13 Issue 3
Volume 13 Issue 2
Volume 13 Issue 1

Year 2019

Volume 12 Issue 3
Volume 12 Issue 2
Volume 12 Issue 1

Year 2018

Volume 11 Issue 3
Volume 11 Issue 2
Volume 11 Issue 1

Year 2017

Volume 10 Issue 3
Volume 10 Issue 2
Volume 10 Issue 1

Year 2016

Volume 9 Issue 3
Volume 9 Issue 2
Volume 9 Issue 1

Year 2015

Volume 8 Issue 3
Volume 8 Issue 2
Volume 8 Issue 1

Year 2014

Volume 7 Issue 3
Volume 7 Issue 2
Volume 7 Issue 1

Year 2013

Volume 6 Issue 3
Volume 6 Issue 2
Volume 6 Issue 1

Year 2012

Volume 5 Issue 3
Volume 5 Issue 2
Volume 5 Issue 1

Year 2011

Volume 4 Issue 3
Volume 4 Issue 2
Volume 4 Issue 1

Year 2010

Volume 3 Issue 3
Volume 3 Issue 2
Volume 3 Issue 1

Year 2009

Volume 2 Issue 3
Volume 2 Issue 2
Volume 2 Issue 1

Year 2008

Volume 1 Issue 3
Volume 1 Issue 2
Volume 1 Issue 1

Volume 14 Issue 1

Identification Risks Evaluation of Partially Synthetic Data with the IdentificationRiskCalculation R Package

Ryan Hornby^(a), Jingchen Hu^(b),(*)

Transactions on Data Privacy 14:1 (2021) 37 - 52

(a) Vassar College, Box 2785, 124 Raymond Ave, Poughkeepsie, NY 12604, United States.

(b) Vassar College, Box 27, 124 Raymond Ave, Poughkeepsie, NY 12604, United States.

e-mail:rhornby @vassar.edu; jihu @vassar.edu

Abstract

We extend a general approach to evaluating identification risk of synthesized variables in partially synthetic data. For multiple continuous synthesized variables, we introduce the use of a radius r in the construction of identification risk probability of each target record, and illustrate with working examples. We create the IdentificationRiskCalculation R package to aid researchers and data disseminators in performing these identification risks evaluation calculations. We demonstrate our methods through the R package with applications to a data sample from the Consumer Expenditure Surveys, and discuss the impacts on risk and data utility of 1) the choice of radius r, 2) the choice of synthesized variables, and 3) the choice of the number of synthetic datasets. We give recommendations for statistical agencies for synthesizing and evaluating identification risk of continuous variables

^* Corresponding author.

ISSN: 1888-5063; ISSN (Digital): 2013-1631; Web Site: http://www.tdp.cat/
Contact: Transactions on Data Privacy; Vicenç Torra; Umeå University; 90187 Umeå (Sweden); e-mail:tdp@tdp.cat
Note: TDP's web site does not use cookies. TDP does not keep information neither on IP addresses nor browsers. For the privacy policy access here.

TDP

Vicenç Torra, Last modified: 15 : 52 April 28 2021.