Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희.

Similar presentations


Presentation on theme: "Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희."— Presentation transcript:

1 Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희

2 Page 2 Transforming Data to Satisfy Privacy Constraints Contents Contents 1. Introduction 1. Introduction 2. Usage based metrics 2. Usage based metrics 3. Genetic algorithm framework 3. Genetic algorithm framework 4. Experiments 4. Experiments 5. Conclusion 5. Conclusion

3 Page 3 Transforming Data to Satisfy Privacy Constraints 1. Introduction ◆ Importance of protecting individual data ◆ Importance of protecting individual data - explicitly identifying data (social security number) - explicitly identifying data (social security number) - potentially identifying data (date of birth, gender, zip code) - potentially identifying data (date of birth, gender, zip code) ◆ how to protect ◆ how to protect - replace any explicitly identifying information by some - replace any explicitly identifying information by some randomized data randomized data - but not sufficient because it can be easily inferred. - but not sufficient because it can be easily inferred. ex) social security number <= zip code, date of birth, ex) social security number <= zip code, date of birth, gender on some data set gender on some data set

4 Page 4 Transforming Data to Satisfy Privacy Constraints 1. Introduction (cont’d) ◆ approach to solve the identity disclosure problem ◆ approach to solve the identity disclosure problem - to perturb the data - to perturb the data - our approach - our approach (1) generalization (1) generalization (2) suppression (2) suppression # generalization # generalization : 1977.1.4 -> generalization -> 1977 : 1977.1.4 -> generalization -> 1977 ◆ Goal ◆ Goal : preserving the anonymity of the individuals : preserving the anonymity of the individuals by generalizations and suppressions by generalizations and suppressions

5 Page 5 Transforming Data to Satisfy Privacy Constraints 2. Usage based metrics (1) Background ◆ flexible generalization ◆ flexible generalization 1. Categorical information 1. Categorical information - e.g., zip code, race, marital status - e.g., zip code, race, marital status - a set of nodes S A, leaf node Y, - a set of nodes S A, leaf node Y, node encountered by root from leaf node P node encountered by root from leaf node P - Y is generalized in A to P - Y is generalized in A to P

6 Page 6 Transforming Data to Satisfy Privacy Constraints 2. Usage based metrics (cont’d) (1) Background ◆ flexible generalization ◆ flexible generalization 2. numeric information 2. numeric information - e.g., age, education in years - e.g., age, education in years - discretization values into set of disjoint interval - discretization values into set of disjoint interval - alternatively numeric value for each interval - alternatively numeric value for each interval (e.g., median) (e.g., median) ex) age ex) age {[0,20),[20,40),[40,60),[60,80),[80,∞)} {[0,20),[20,40),[40,60),[60,80),[80,∞)}

7 Page 7 Transforming Data to Satisfy Privacy Constraints 2. Usage based metrics (cont’d) (2) Multiple uses ◆ Multiple usage, unknown usage ◆ Multiple usage, unknown usage ◆ assumption ◆ assumption : all potentially identifying columns are equally important : all potentially identifying columns are equally important ◆ Consider “loss” ◆ Consider “loss” 1. Categorical information (fig.1) 1. Categorical information (fig.1) - M : the total number of leaf nodes - M : the total number of leaf nodes - M P : the number of leaf noes in the subtree rooted at - M P : the number of leaf noes in the subtree rooted at node P node P - loss : (M P – 1) / ( M –1) -> 2/7 - loss : (M P – 1) / ( M –1) -> 2/7

8 Page 8 Transforming Data to Satisfy Privacy Constraints 2. Usage based metrics (cont’d) (2) Multiple uses 2. numeric information 2. numeric information - a entry which is generalized to an interval i - a entry which is generalized to an interval i - lower end points Li, upper end points Ui - lower end points Li, upper end points Ui - lower bounds for values in this column L - lower bounds for values in this column L upper bounds for values in this column U upper bounds for values in this column U - loss : (Ui – Li) / (U-L) -> 2/15 - loss : (Ui – Li) / (U-L) -> 2/15

9 Page 9 Transforming Data to Satisfy Privacy Constraints 2. Usage based metrics (cont’d) (3) Predictive modeling use ◆ transformed table is used to build predictive models for some attributes. some attributes. ex) modeling the customers interested in specific category ex) modeling the customers interested in specific category of products. of products. ◆ data accuracy vs privacy protection ◆ data accuracy <- all the rows in G have the same class label ◆ definition of classification metric CM

10 Page 10 Transforming Data to Satisfy Privacy Constraints 3. Genetic algorithm framework Solving optimization problem ◆ 자연의 진화론을 적용 ◆ 각 규칙들은 비트스트링 (chromosome) 으로 표현됨 -> 일반화 ◆ 속성 A1 과 A2, 클래스 C1 과 C2 “IF A1 AND NOT A2 THEN C2" -> "100" “IF A1 AND NOT A2 THEN C2" -> "100" "IF NOT A1 AND NOT A2 THEN C1" -> "001“ "IF NOT A1 AND NOT A2 THEN C1" -> "001“ ◆ 새로운 규칙생성 : 교차 (crossover), 돌연변이 (mutation) : 교차 (crossover), 돌연변이 (mutation)

11 Page 11 Transforming Data to Satisfy Privacy Constraints 4. Experiments ◆ 30162 records from adult benchmark in the UCI repository ◆ 8 attr : age, work class, education, marital status, occupation, race, gender, native country occupation, race, gender, native country [Experiment 1] 1. CM: little degradation as k=10~500 2. low value (around 0.18) : good transformation : good transformation 3. solution for k=250 generalizes away all the information in the attrs.. all the information in the attrs.. 4. LM : algorithm didn’t optimize LM metic : algorithm didn’t optimize LM metic 5. Solutions targeted at one usage * higher value K * higher value K -> stricter privacy constraints -> stricter privacy constraints

12 Page 12 Transforming Data to Satisfy Privacy Constraints 4. Experiments (cont’d) [Experiment 2] 1. LM values 0.21~0.49 as k 10~500 2. tradeoff level of privacy (by k) vs level of privacy (by k) vs loss of information (LM) loss of information (LM) 3. CM fall in the range from 0.3 to 0.4 [By experiment 1,2] -the value is tailored to the purpose of which the data is disseminated. -It is Difficult to produce truly multi- purpose data set

13 Page 13 Transforming Data to Satisfy Privacy Constraints Data transform ation is Done by G, S Identifyin g content Data transform ation is Done by G, S Identifyin g content We conside red Informati on Loss Caused by transform ation By using metrics We conside red Informati on Loss Caused by transform ation By using metricsCon-Clu-sion Generalizing suppressing Generalizing suppressing Information loss Information loss 5. Conclusion future works Usefulnes s vs privacy Usefulnes s vs privacy Dual goals ◆ wider data set : potentially identifying attr disclosure risk ◆ wider data set : potentially identifying attr ↑, disclosure risk ↑ ◆ sensitive attribute finding adequate way of handling sensitive attr ◆ additive noise swapping : approach to inferential disclosure of sensitive attr ◆ non-identifying attr : both consideration -> better solution


Download ppt "Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희."

Similar presentations


Ads by Google