Download presentation
Presentation is loading. Please wait.
Published byGrant Victor Gaines Modified over 9 years ago
1
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa Department of Computer Science 1
2
Privacy-Preserving Data Mining New privacy-preserving data mining techniques: For individual privacy: Personal data are private For corporate privacy: Knowledge extracted is private Goal: to develop algorithms for modifying the original data, so that private data are protected private knowledge remain private even after the mining tasks Analysis results are still useful Natural trade-off between privacy quantification and data utility 2
3
3 Secure Outsourcing of Data Mining all encrypted transactions in D* and items contained in it are secure given any mining query the server can compute the encrypted result encrypted mining and analysis results are secure the owner can decrypt the results and so, reconstruct the exact result the space and time incurred by the owner in the process has to be minimum The server has access to data of the owner Data owner has the property of Data Knowledge extracted from data
4
A Solution for Pattern Mining: K-anonymity Attack Model: the attacker knows the set of plain items and their true supports in D exactly and has access to the encrypted database D ∗ Item-based attack: guessing the plain item corresponding to the cipher item e with probability prob(e) Itemset-based attack: guessing the plain itemset corresponding to the cipher itemset E with probability prob(E) 4 + Encryption: Replacing each plain item in D by a 1-1 substitution cipher Adding fake transactions K-Anonymity: for each item e there are at least others k-1 cipher items Decryption: A Synopsis allows computing the actual support of every pattern
5
Privacy-Preserving DT Framework GOAL: publishing and sharing various forms of data without disclosing sensitive personal information while preserving mining results Sequence data Query-Log data ….… Problem: Anonymizing sequence data while preserving sequential pattern mining results Attack Model: Sequence Linking Attack The attacker knows part of a sequence and want to guess the whole correct sequence Idea: Combining k-anonymity and sequence hiding methods and reformulating the problem as that of hiding k-infrequent sequences 5
6
Running example: k = 2 Dataset D B C A B C D B C E B C D Dataset D’ B C A B C D B C A B C D Root B:3 C:3 E:1 A:2 B:2 C:2 D : 2 D:1 Prefix Tree Construction Tree Pruning Tree Reconstruction Generation of D’ LCS: 1. B C 2. B C D Root B:3 C:3 E:1 A:2 B:2 C:2 D : 2 D:1 L cut B C E : 1 B C D : 1 Root B:1 C:1 A:2 B:2 C:2 D : 2 Root B:2 C:2 A:3 B:3 C:3 D : 3 Root B:2 C:2 A:2 B:2 C:2 D:2 6
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.