Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhengli Huang and Wenliang (Kevin) Du

Similar presentations


Presentation on theme: "Zhengli Huang and Wenliang (Kevin) Du"— Presentation transcript:

1 Zhengli Huang and Wenliang (Kevin) Du
OptRR: Optimizing Randomized Response Schemes For Privacy-Preserving Data Mining Zhengli Huang and Wenliang (Kevin) Du Department of EECS Syracuse University

2 Data Mining/Analysis Data cannot be published directly because of privacy concern

3 Background: Randomized Response
The true answer is “Yes” Do you smoke? Yes Head Biased coin: No Tail

4 RR for Categorical Data
Si Si+1 Si+2 Si+3 q1 q2 q3 q4 True Value: Si M

5 A Generalization Several RR Matrices have been proposed
[Warner 65] [R.Agrawal et al. 05], [S. Agrawal et al. 05] RR Matrix can be arbitrary Can we find optimal RR matrices?

6 What is an optimal matrix?
Which of the following is better? Privacy: M2 is better Utility: M1 is better So, what is an optimal matrix?

7 Optimal RR Matrix An RR matrix M is optimal if no other RR matrix’s privacy and utility are both better than M (i, e, no other matrix dominates M). Privacy Quantification Utility Quantification A number of privacy and utility metrics have been proposed. We use the following: Privacy: how accurately one can estimate individual info. Utility: how accurately we can estimate aggregate info.

8 Optimization Methods Approach 1: Weighted sum: Approach 2
w1 Privacy + w2 Utility Approach 2 Fix Privacy, find M with the optimal Utility. Fix Utility, find M with the optimal Privacy. Challenge: Difficult to generate M with a fixed privacy or utility. Our Approach: Multi-Objective Optimization

9 Evolutionary Multi-Objective Optimization (EMOO)
Genetic algorithms has difficulty of dealing with multiple objectives. We use the EMOO algorithm We use SPEA2.

10 Our SPEA2-based algorithm

11 EMOO Evolution Fitness Assignment (SPEA2) Crossover Mutation
Strength Value S(M): the number of matrix dominated by M. Raw fitness F’(M): the sum of the strength of the RR matrices that dominate M. The lower the better. Density d(M): discriminate the matrices with the same fitness.

12 Diversity Worse M5 M4 M3 M2 Utility M1 Better Privacy

13 The Output of Optimization
Pareto Fronts The optimal set is often plotted in the objective space and the plot is called the Pareto front. Utility (error) Privacy

14 Experiments For normal distribution with different δ

15 For First attribute of Adult data

16 For normal distribution (δ=0.75)

17 Summary We use an evolutionary multi-objective optimization technique to search for optimal RR matrices. The evaluation shows that our scheme achieves better performance than the existing RR schemes.


Download ppt "Zhengli Huang and Wenliang (Kevin) Du"

Similar presentations


Ads by Google