20 20

Transactions on
Data Privacy
Foundations and Technologies

http://www.tdp.cat


Articles in Press

Accepted articles here

Latest Issues

Year 2024

Volume 17 Issue 1

Year 2023

Volume 16 Issue 3
Volume 16 Issue 2
Volume 16 Issue 1

Year 2022

Volume 15 Issue 3
Volume 15 Issue 2
Volume 15 Issue 1

Year 2021

Volume 14 Issue 3
Volume 14 Issue 2
Volume 14 Issue 1

Year 2020

Volume 13 Issue 3
Volume 13 Issue 2
Volume 13 Issue 1

Year 2019

Volume 12 Issue 3
Volume 12 Issue 2
Volume 12 Issue 1

Year 2018

Volume 11 Issue 3
Volume 11 Issue 2
Volume 11 Issue 1

Year 2017

Volume 10 Issue 3
Volume 10 Issue 2
Volume 10 Issue 1

Year 2016

Volume 9 Issue 3
Volume 9 Issue 2
Volume 9 Issue 1

Year 2015

Volume 8 Issue 3
Volume 8 Issue 2
Volume 8 Issue 1

Year 2014

Volume 7 Issue 3
Volume 7 Issue 2
Volume 7 Issue 1

Year 2013

Volume 6 Issue 3
Volume 6 Issue 2
Volume 6 Issue 1

Year 2012

Volume 5 Issue 3
Volume 5 Issue 2
Volume 5 Issue 1

Year 2011

Volume 4 Issue 3
Volume 4 Issue 2
Volume 4 Issue 1

Year 2010

Volume 3 Issue 3
Volume 3 Issue 2
Volume 3 Issue 1

Year 2009

Volume 2 Issue 3
Volume 2 Issue 2
Volume 2 Issue 1

Year 2008

Volume 1 Issue 3
Volume 1 Issue 2
Volume 1 Issue 1


Volume 15 Issue 2


10 is the safest number that there's ever been

Felix Ritchie(a),(*)

Transactions on Data Privacy 15:2 (2022) 109 - 140

Abstract, PDF

(a) University of the West of England, Bristol.

e-mail:felix.ritchie @uwe.ac.uk


Abstract

When checking frequency and magnitude tables for disclosure risk, the cell threshold (the minimum number of observations in each cell) is a crucial parameter. In rules-based environments, this is a hard limit on what can or can't be published. In principles-based environments, this is less important but has an impact on the operational effectiveness of statistical disclosure control (SDC) processes.

Determining the appropriate threshold is an unsolved problem. Ten is a common threshold value for both national statistics and research outputs, but five or twenty are also popular. Some organisations use multiple thresholds for different data sources.

These higher thresholds are all entirely subjective. Three is the only threshold which has an objective statistical foundation, but most organisations argue that this leaves little margin for error. Unfortunately, there is no equivalent statistical case for any number larger than three: ten is popular because it is popular. This is particularly the case for research environments, where there is no guidance.

This paper provides the first empirical foundation for threshold selection by modelling alternative threshold values on both synthetic data and real datasets. The paper demonstrates that this is a complex question. The trade-off between risk and value is well- known, but we demonstrate that the protection of a higher threshold depends on the risk measure. There is no monotonic relation between a threshold and risk, as higher thresholds can increase disclosure risk in particular scenarios. The blind application of high-threshold rules might mask new risks. There is no unambiguous result, other than the simplistic ones that more observations reduces risk and higher thresholds reduce utility.

Finally, the paper notes that a reconsideration of disclosure checking practices can reduce risk irrespective of the threshold for some risk scenarios.

* Corresponding author.

Follow us




Supports



ISSN: 1888-5063; ISSN (Digital): 2013-1631; D.L.:B-11873-2008; Web Site: http://www.tdp.cat/
Contact: Transactions on Data Privacy; Vicenç Torra; Umeå University; 90187 Umeå (Sweden); e-mail:tdp@tdp.cat
Note: TDP's web site does not use cookies. TDP does not keep information neither on IP addresses nor browsers. For the privacy policy access here.

 


Vicenç Torra, Last modified: 23 : 16 August 31 2022.