Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore.

Similar presentations


Presentation on theme: "Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore."— Presentation transcript:

1 Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore www.comp.nus.edu.sg/~kalnis

2 2 Motivation  Attacker can see up to m items Any m items No distinction between sensitive and non-sensitive items 0% Milk Pregnancy test Beer Helen

3 3 Motivation (cont.) Helen: Beer, 0% Milk, Pregnancy test John: Cola, Cheese Tom: 2% Milk, Coffee …. Mary: Wine, Beer, Full-fat Milk Database t1: Beer, 0%Milk, Pregnancy test t2: Cola, Cheese t3: 2% Milk, Coffee …. tn: Wine, Beer, Full-fat Milk Published Attacker Find all transactions that contain Beer & 0% Milk t1: Beer, Milk, Pregnancy test t2: Cola, Cheese t3: Milk, Coffee …. tn: Wine, Beer, Milk

4 4 k m -anonymity Set of items Transaction Database Query terms k m -anonymity:

5 5 Related Work: K-Anonymity [Swe02] AgeZipCodeDisease 4225000Flu 4635000AIDS 5020000Cancer 5440000Gastritis 4850000Dyspepsia 5655000Bronchitis [Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002. (a) Microdata Quasi-identifier AgeZipCodeDisease 42-4625000-35000Flu 42-4625000-35000AIDS 50-5420000-40000Cancer 50-5420000-40000Gastritis 48-5650000-55000Dyspepsia 48-5650000-55000Bronchitis (a) 2-anonymous microdata NOT suitable for high-dimensionality

6 6 Related Work: L-diversity in Transactions [GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008 Requires knowledge of (non)-sensitive attributes

7 7 Our Approach: Employs Generalization Generalization Hierarchy Information loss k=2 m=2

8 8 Lattice of Generalizations

9 9 Count Tree 11111 1 1 1 2 32 2

10 10 Optimal Algorithm      Q:    Q:    Q: 

11 11 “Direct” Anonymization      COUNT({a 1,a 2 })=1  Solves each “problem” independently

12 12 “Apriori-based” Anonymization  Construct the count-tree incrementally  Prune unnecessary branches

13 13 Small Datasets (2-15K, BMS-WebView2)  |I|=40..60, k=100, m=3

14 14 Small Datasets (BMS-WebView2)  |D|=10K, k=100, m=1..4

15 15 Apriori Anonymization for Large Datasets 500sec 10sec 100sec |D||D||I||I| 515K1657 59K497 77K3340  k=5  m=3

16 16 Points to Remember  Anonymization of Transactional Data Attacker knows m items Any m items can be the quasi-identifier  Global recoding method Optimal solution: too slow Apriori Anonymization: fast and low information loss  On-going work Local recoding (sort by Gray order and partition) Transactional data in streaming environments

17 17 Bibliography on LBS Privacy http://anonym.comp.nus.edu.sg


Download ppt "Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore."

Similar presentations


Ads by Google