Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.

Similar presentations


Presentation on theme: "SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte."— Presentation transcript:

1 SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte

2 SAC, Dijon, FranceApril 23-27, 20062 Source: http://www.privacyinternational.org/issues/foia/foia- laws.jpg

3 SAC, Dijon, FranceApril 23-27, 20063 Source: http://www.privacyinternational.org/survey/dpmap.jpg HIPAA for health care  California State Bill 1386 Grann-Leach-Bliley Act for financial COPPA for childern’s online privacy PIPEDA 2000 European Union (Directive 94/46/EC)

4 SAC, Dijon, FranceApril 23-27, 20064 Mining vs. Privacy  Data mining The goal of data mining is summary results (e.g., classification, cluster, association rules etc.) from the data (distribution)  Individual Privacy Individual values in database must not be disclosed, or at least no close estimation can be derived by attackers  Privacy Preserving Data Mining (PPDM) How to “perturb” data such that  we can build a good data mining model (data utility)  while preserving individual’s privacy at the record level (privacy)?

5 SAC, Dijon, FranceApril 23-27, 20065 Outline  Additive Randomization Distribution Reconstruction  Bayesian Method Agrawal & Srikant SIGMOD00  EM Method Agrawal & Aggawal PODS01 Individual Value Reconstruction  Spectral Filtering H. Kargupta ICDM03  PCA Technique Du et al. SIGMOD05  Error Bound Analysis for Spectral Filtering Upper Bound  Conclusion and Future Work

6 SAC, Dijon, FranceApril 23-27, 20066 Additive Randomization  To hide the sensitive data by randomly modifying the data values using some additive noise  Privacy preserving aims at and  Utility preserving aims at The aggregate characteristics remain unchanged or can be recovered

7 SAC, Dijon, FranceApril 23-27, 20067 Distribution Reconstruction  The original density distribution can be reconstructed effectively given the perturbed data and the noise's distribution -- – Agrawal & Srikant SIGMOD 2000 Independent random noises with any distribution f X 0 := Uniform distribution j := 0 // Iteration number repeat f X j+1 (a) := j := j+1 until (stopping criterion met)  It can not reconstruct individual value

8 SAC, Dijon, FranceApril 23-27, 20068 Individual Value Reconstruction  Spectral Filtering, Kargupta et al. ICDM 2003 1. Apply EVD : 2. Using some published information about V, extract the first k components of as the principal components. – and are the corresponding eigenvectors. – forms an orthonormal basis of a subspace. 3. Find the orthogonal projection on to : 4. Get estimate data set: PCA Technique, Huang, Du and Chen, SIGMOD 05

9 SAC, Dijon, FranceApril 23-27, 20069 Motivation  Previous work on individual reconstruction are only empirical The relationship between the estimation accuracy and the noise was not clear  Two questions Attacker question: How close the estimated data using SF is from the original one? Data owner question: How much noise should be added to preserve privacy at a given tolerated level?

10 SAC, Dijon, FranceApril 23-27, 200610 Our Work  Investigate the explicit relationship between the estimation accuracy and the noise  Derive one upper bound of in terms of V The upper bound determines how close the estimated data achieved by attackers is from the original one It imposes a serious threat of privacy breaches

11 SAC, Dijon, FranceApril 23-27, 200611 Preliminary  F-norm and 2-norm  Some properties and,the square root of the largest eigenvalue of A T A If A is symmetric, then,the largest eigenvalue of A

12 SAC, Dijon, FranceApril 23-27, 200612 Matrix Perturbation  Traditional Matrix perturbation theory How the derived perturbation E affects the co- variance matrix A  Our scenario How the primary perturbation V affects the data matrix U AE+

13 SAC, Dijon, FranceApril 23-27, 200613 Error Bound Analysis   Prop 1. Let covariance matrix of the perturbed data be. Given and  Prop 2. (eigenvalue of E) (eigengap)

14 SAC, Dijon, FranceApril 23-27, 200614 Theorem  Given a date set and a noise set we have the perturbed data set. Let be the estimation obtained from the Spectral Filtering, then where is the derived perturbation on the original covariance matrix A = UU T Proof is skipped 

15 SAC, Dijon, FranceApril 23-27, 200615 Special Cases  When the noise matrix is generated by i.i.d. Gaussian distribution with zero mean and known variance  When the noise is completely correlated with data

16 SAC, Dijon, FranceApril 23-27, 200616 Experimental Results  Artificial Dataset  35 correlated variables  30,000 tuples

17 SAC, Dijon, FranceApril 23-27, 200617 Experimental Results  Scenarios of noise addition Case 1: i.i.d. Gaussian noise  N(0,COV), where COV = diag(σ 2,…, σ 2 ) Case 2: Independent Gaussian noise  N(0,COV), where COV = c * diag(σ 1 2, …, σ n 2 ) Case 3: Correlated Gaussian noise  N(0,COV), where COV = c * Σ U (or c * A……)  Measure Absolute error Relative error

18 SAC, Dijon, FranceApril 23-27, 200618 Determining k  Determine k in Spectral Filtering According to Matrix Perturbation Theory Our heuristic approach:  check  K =

19 SAC, Dijon, FranceApril 23-27, 200619 Effect of varying k (case 1)  N(0,COV), where COV = diag(σ 2,…, σ 2 ) relative error ||V|| F 2293235617251025 σ2σ2 0.050.100.30.51.0 K=10.430.440.450.460.48 K=20.220.230.260.290.36 K=30.160.180.240.29*0.31 K=4*0.09*0.12*0.22*0.280.40 K=50.100.140.250.320.45

20 SAC, Dijon, FranceApril 23-27, 200620 Effect of varying k (case 2) N(0,COV), where COV = c * diag(σ 1 2, σ 2 2 …, σ n 2 ) relative error ||V|| F 2293235617251025 c0.070.150.440.741.45 K=10.44 0.450.460.49 K=20.220.230.27*0.30*0.36 K=30.160.180.240.330.44 K=4*0.07*0.11*0.230.370.50 K=50.090.130.260.400.56

21 SAC, Dijon, FranceApril 23-27, 200621 Effect of varying k (case 3)  N(0,COV), where COV = c * Σ U ||V|| F 2293235617251025 c0.070.150.440.741.45 K=10.500.550.730.88*1.17 K=20.340.430.680.861.19 K=30.300.410.670.861.20 K=4*0.27*0.38*0.65*0.851.20 K=50.270.380.650.851.20

22 SAC, Dijon, FranceApril 23-27, 200622 σ 2 =0.5σ 2 =0.1 Effect of varying noise σ 2 =1.0 ||V|| F /||U|| F = 87.8%

23 SAC, Dijon, FranceApril 23-27, 200623 Case 1 Effect of covariance matrix Case 3 Case 2 ||V|| F /||U|| F = 39.1%

24 SAC, Dijon, FranceApril 23-27, 200624 Conclusion  Spectral filtering based technique has been investigated as a major means of point-wise data reconstruction.  We present the upper bound which enables attackers to determines how close the estimated data achieved by attackers is from the original one

25 SAC, Dijon, FranceApril 23-27, 200625 Future Work  We are working on the lower bound which represents the best estimate the attacker can achieve using SF which can be used by data owners to determine how much noise should be added to preserve privacy  Bound analysis at point-wise level

26 SAC, Dijon, FranceApril 23-27, 200626 Acknowledgement  NSF Grant CCR-0310974 IIS-0546027  Personnel Xintao Wu Songtao Guo Ling Guo  More Info http://www.cs.uncc.edu/~xwu/ xwu@uncc.edu, xwu@uncc.edu

27 SAC, Dijon, FranceApril 23-27, 200627 Questions? Thank you!


Download ppt "SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte."

Similar presentations


Ads by Google