|
Issues in Estimating Reidentification Risk Using Log-Linear Models in Complex Survey Samples
Lin Li(a),(*), Jianzhu Li(b), Tom Krenzke(c)
Transactions on Data Privacy 19:2 (2026) 81 - 111
Abstract, PDF
(a) Westat, 7501 Wisconsin Avenue, Bethesda, MD 20814, USA.
(b) FINRA, 1735 K St NW, Washington, DC 20006, USA.
(c) Westat, 7501 Wisconsin Avenue, Bethesda, MD 20814, USA.
e-mail:linli @westat.com; jianzhulee @hotmail.com; tomkrenzke @westat.com
|
|
Abstract
In this paper, we discuss some practical issues encountered when estimating record-level and file-level disclosure risk measures of re-identification in survey microdata under complex survey designs. We use the probabilistic modelling approach based on the Poisson Distribution and log-linear modelling proposed in Skinner and Shlomo (2008) to estimate disclosure risk in survey microdata files. We examine the robustness of their GOF criteria to violations of model assumptions, particularly in the context of complex survey designs and differential survey weights, using a case study and simulations. We also provide guidance for variable selection with insights on how to proceed with the disclosure risk assessment and provide meaningful results. For the case study, we use the complex survey dataset from the Survey of Doctorate Recipients conducted by the National Center for Science and Engineering Statistics. The results of evaluating the disclosure risk estimates under different approaches of adjusting the probabilistic modelling to account for the complex survey data lead to guidance for a sensitivity analysis that helps to provide better estimates of record-level and file-level risk of re-identification in survey microdata.
|