Presentation is loading. Please wait.

Presentation is loading. Please wait.

PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.

Similar presentations


Presentation on theme: "PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa."— Presentation transcript:

1 PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa Department of Computer Science 1

2 Privacy-Preserving Data Mining  New privacy-preserving data mining techniques:  For individual privacy: Personal data are private  For corporate privacy: Knowledge extracted is private  Goal: to develop algorithms for modifying the original data, so that  private data are protected  private knowledge remain private even after the mining tasks  Analysis results are still useful  Natural trade-off between privacy quantification and data utility 2

3 3 Secure Outsourcing of Data Mining  all encrypted transactions in D* and items contained in it are secure  given any mining query the server can compute the encrypted result  encrypted mining and analysis results are secure  the owner can decrypt the results and so, reconstruct the exact result  the space and time incurred by the owner in the process has to be minimum  The server has access to data of the owner  Data owner has the property of  Data  Knowledge extracted from data

4 A Solution for Pattern Mining: K-anonymity  Attack Model: the attacker knows the set of plain items and their true supports in D exactly and has access to the encrypted database D ∗  Item-based attack: guessing the plain item corresponding to the cipher item e with probability prob(e)  Itemset-based attack: guessing the plain itemset corresponding to the cipher itemset E with probability prob(E) 4 +  Encryption:  Replacing each plain item in D by a 1-1 substitution cipher  Adding fake transactions  K-Anonymity: for each item e there are at least others k-1 cipher items  Decryption: A Synopsis allows computing the actual support of every pattern

5 Privacy-Preserving DT Framework  GOAL: publishing and sharing various forms of data without disclosing sensitive personal information while preserving mining results  Sequence data  Query-Log data  ….…  Problem: Anonymizing sequence data while preserving sequential pattern mining results  Attack Model: Sequence Linking Attack  The attacker knows part of a sequence and want to guess the whole correct sequence  Idea: Combining k-anonymity and sequence hiding methods and reformulating the problem as that of hiding k-infrequent sequences 5

6 Running example: k = 2 Dataset D B C A B C D B C E B C D Dataset D’ B C A B C D B C A B C D Root B:3 C:3 E:1 A:2 B:2 C:2 D : 2 D:1 Prefix Tree Construction Tree Pruning Tree Reconstruction Generation of D’ LCS: 1. B C 2. B C D Root B:3 C:3 E:1 A:2 B:2 C:2 D : 2 D:1 L cut B C E : 1 B C D : 1 Root B:1 C:1 A:2 B:2 C:2 D : 2 Root B:2 C:2 A:3 B:3 C:3 D : 3 Root B:2 C:2 A:2 B:2 C:2 D:2 6


Download ppt "PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa."

Similar presentations


Ads by Google