Data Sanitization for t-Closeness over Multiple Numerical Sensitive Attributes
Rajiv Bagai(a),(*), Eric Weber(b), Vikas Thammanna Gowda(a)
Transactions on Data Privacy 16:3 (2023) 191 - 210
(a) School of Computing, Wichita State University, Wichita, KS 67260-0083, USA.
(b) NetApp Inc., Wichita, KS 67226, USA.
e-mail:rajiv.bagai @wichita.edu; ;
A popular technique for preserving privacy of individuals contained in any released data is to first sanitize the data according to the t-closeness principle. This principle requires partitioning rows of the original data into equivalence classes, in a way that the distribution of sensitive values in any class is sufficiently close, within a given threshold t, to their distribution in the original data. Most existing methods for constructing t-close equivalence classes consider just one sensitive attribute in the data, which is insufficient as many real-life datasets contain multiple sensitive attributes; partitioning attempts for multiple sensitive attributes have thus far been unsatisfactory. We present a method for generating t-close equivalence classes in the presence of multiple numerical sensitive attributes, where each such attribute has its own privacy threshold. The equivalence classes are generated in a way that minimizes information loss caused later by generalizing quasi identifier values within each class. While finding an optimal solution for this problem is known to be NP-hard, we show that our approach results in an acceptable solution in polynomial time.