Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.

Similar presentations


Presentation on theme: "Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian."— Presentation transcript:

1 Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian

2 Outline What is PPDP  Existing Privacy Principles Proximity Attack  (ε, m)-anonymity  Determine ε and m  Algorithm Experiments and Conclusion

3 Privacy Preservation Data Publishing A true story in Massachusetts, 1997  GIC  20 dollars  Governor Weld

4 PPDP Privacy  Sensitive information of individuals should be protected in the published data  More anonymized data Utility  The published data should be useful  More accurate data

5 PPDP Anonymization Technique  Generalization Specific value -> General value Maintain the semantic meaning  78256 -> 7825*, UTSA -> University, 28 -> [20, 30]  Perturbation One value -> another random value Huge information loss -> poor utility

6 PPDP Example of Generalization

7 Some Existing Privacy Principles Generalization  SA – Categorical k-anonymity l-diversity, (α, k)-anonymity, m-invariance, … (c, k)-safety, Skyline-privacy …  SA – Numerical (k, e)-anonymity, Variance Control t-closeness δ-presence …

8 Next… What is PPDP  Existing Privacy Principles Proximity Attack  (ε, m)-anonymity  Determine ε and m  Algorithm Experiments and Conclusion

9 Proximity Attack

10 (ε, m)-anonymity I(t)  private neighborhood of tuple t  I(t) = [t.SA − ε, t.SA + ε]  I(t) = [t.SA·(1 − ε), t.SA·(1 + ε)] P(t)  the risk of proximity breach of tuple t  P(t) = x / |G|

11 (ε, m)-anonymity ε = 20 I(t1) = [980, 1020] x = 3, |G| = 4 P(t1) = 3/4

12 (ε, m)-anonymity Principle  Given a real value ε and an integer m ≥ 1, a generalized table T ∗ fulfills absolute (relative) (ε,m)-anonymity, if P(t) ≤ 1/m for every tuple t ∈ T.  Larger ε and m mean stricter privacy requirement

13 (ε, m)-anonymity What is the Meaning of m?  |G| ≥ m  The best situation is for any two tuples t i and t j in G, and  Similar to l-diversity when the equivalence class has l tuples with distinct SA values.

14 (ε, m)-anonymity How to make t j.SA does not fall in I(t i )?  All tuples in G are sorted in ascending order of their SA values  | j – i | ≧ max{ |left(t j,G)|, |right(t i,G)| }

15 (ε, m)-anonymity Let maxsize(G) = max ∀ t ∈ G { max{ |left(t,G)|, |right(t,G)| } } | j – i | ≧ maxsize(G)

16 (ε, m)-anonymity Partitioning  Ascending order of tuples in G according to SA values  Hash the ith tuple into the jth bucket using function j = (i mod maxsize(G))+1  Thus, all tuples (SA values) in the same bucket do not fall into the neighborhood of each other.

17 (ε, m)-anonymity (6, 2)-anonymity  Privacy is breached  P(t 3 )= ¾ >1/m =1/2 Need partitioning  An ascending order is ready according to SA values  g = maxsize(G) = 2  j = (i mod 2)+1  New P(t 3 )= 1/2 tupleNoQISA 1q10 2q20 3q25 4q30

18 Determine ε and m Given ε and m  Check if an equivalence class G satisfies (ε, m)- anonymity  Theorem: G has at least one (ε, m)-anonymous generalization, iff Scan the sorted tuples in G to find maxsize(G) Predict whether G can be partitioned or not

19 Algorithm Step 1: Splitting  Mondrain, ICDE 2006.  Splitting is only based on QI-attributes  Iteratively find median value of frequency sets on one selected QI-dimension to cut G into G1 and G2, and make sure G1 and G2 are legal to be partitioned.

20 Algorithm Splitting ((6, 2)-anonymity) 20 30 25 1040 50

21 Algorithm Step 2: Partitioning  After step 1 stops  Check all G produced by splitting Release directly if G satisfies (ε, m)-anonymity Otherwise, Partitioning, and then release new buckets

22 Algorithm Partitioning ((6, 2)-anonymity) 20 30 25 1040 50

23 Next… What is PPDP Evolution of Privacy Preservation Proximity Attack  (ε, m)-anonymity  determine ε and m  algorithm Experiments and Conclusion

24 Experiments Real Database SAL http://ipums.orghttp://ipums.org  Attributes are Age, Birthplace, Occupation and Income with domains [16,93], [1,710], [1,983], and [1k, 100k], respectively.  500K tuples Compare to a perturbation method (OLAP, SIGMOD 2005 )

25 Experiments - Utility Use count query with workload = 1000

26 Experiments - Utility

27 Experiments - Efficiency

28 Conclusion Discuss most of existing privacy principles in PPDP Identify the proximity attack and propose (ε, m)-anonymity to prevent this attack Verify that the method is effective and efficient experimentally

29 Any Question?


Download ppt "Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian."

Similar presentations


Ads by Google