PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Presenter: Nguyen Ba Anh HCMC University of Technology Information System Security Course.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Reconstruction-Based Association Rule Hiding Author: Yuhong Guo (MS-Ph.D. Candidate, Peking Univ., China) Advisor: Prof. Shiwei Tang Co-Advisors:
FP-Growth algorithm Vasiljevic Vladica,
Data Mining Association Analysis: Basic Concepts and Algorithms
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Mar 12, 2002Mårten Trolin1 This lecture Diffie-Hellman key agreement Authentication Certificates Certificate Authorities SSL/TLS.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases Brian Thompson 1, Stuart Haber 2, William G. Horne 2, Tomas.
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Preserving Privacy in Clickstreams Isabelle Stanton.
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore.
D ATABASE S ECURITY Proposed by Abdulrahman Aldekhelallah University of Scranton – CS521 Spring2015.
Database Laboratory Regular Seminar TaeHoon Kim.
History and Background Part 1: Basic Concepts and Monoalphabetic Substitution CSCI 5857: Encoding and Encryption.
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Privacy Preserving Query Processing in Cloud Computing Wen Jie
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.
林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
A Privacy-Preserving Interdomain Audit Framework Adam J. Lee Parisa Tabriz Nikita Borisov University of Illinois, Urbana-Champaign WPES 2006.
Sequential PAttern Mining using A Bitmap Representation
Secure Incremental Maintenance of Distributed Association Rules.
Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)‏ www-kdd.isti.cnr.it Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Tools for Privacy Preserving Distributed Data Mining
Trajectory Pattern Mining Fosca Giannotti, Mirco Nanni, Dino Pedreschi, Fabio Pinelli KDD Lab (ISTI-CNR & Univ. Pisa) Presented by: Qiming Zou.
Mining High Utility Itemset in Big Data
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
HIDING EMERGING PATTERNS WITH LOCAL RECODING GENERALIZATION Presented by: Michael Cheng Supervisor: Dr. William Cheung Co-Supervisor: Dr. Byron Choi.
Collusion-Resistant Anonymous Data Collection Method Mafruz Zaman Ashrafi See-Kiong Ng Institute for Infocomm Research Singapore.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Privacy-preserving data publishing
BZUPAGES.COM Cryptography Cryptography is the technique of converting a message into unintelligible or non-understandable form such that even if some unauthorized.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
Presented By Amarjit Datta
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Secure Data Outsourcing
Keyword search on encrypted data. Keyword search problem  Linux utility: grep  Information retrieval Basic operation Advanced operations – relevance.
Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.
CRYPTOGRAPHY Cryptography is art or science of transforming intelligible message to unintelligible and again transforming that message back to the original.
Center for E-Business Technology Seoul National University Seoul, Korea Private Queries in Location Based Services: Anonymizers are not Necessary Gabriel.
Security in Outsourcing of Association Rule Mining
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Privacy Preserving Similarity Evaluation of Time Series Data
A Privacy-Preserving Index for Range Queries
Chapter 3:Cryptography (16M)
Privacy Preserving Data Mining
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Privacy preserving cloud computing
Presentation transcript:

PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa Department of Computer Science 1

Privacy-Preserving Data Mining  New privacy-preserving data mining techniques:  For individual privacy: Personal data are private  For corporate privacy: Knowledge extracted is private  Goal: to develop algorithms for modifying the original data, so that  private data are protected  private knowledge remain private even after the mining tasks  Analysis results are still useful  Natural trade-off between privacy quantification and data utility 2

3 Secure Outsourcing of Data Mining  all encrypted transactions in D* and items contained in it are secure  given any mining query the server can compute the encrypted result  encrypted mining and analysis results are secure  the owner can decrypt the results and so, reconstruct the exact result  the space and time incurred by the owner in the process has to be minimum  The server has access to data of the owner  Data owner has the property of  Data  Knowledge extracted from data

A Solution for Pattern Mining: K-anonymity  Attack Model: the attacker knows the set of plain items and their true supports in D exactly and has access to the encrypted database D ∗  Item-based attack: guessing the plain item corresponding to the cipher item e with probability prob(e)  Itemset-based attack: guessing the plain itemset corresponding to the cipher itemset E with probability prob(E) 4 +  Encryption:  Replacing each plain item in D by a 1-1 substitution cipher  Adding fake transactions  K-Anonymity: for each item e there are at least others k-1 cipher items  Decryption: A Synopsis allows computing the actual support of every pattern

Privacy-Preserving DT Framework  GOAL: publishing and sharing various forms of data without disclosing sensitive personal information while preserving mining results  Sequence data  Query-Log data  ….…  Problem: Anonymizing sequence data while preserving sequential pattern mining results  Attack Model: Sequence Linking Attack  The attacker knows part of a sequence and want to guess the whole correct sequence  Idea: Combining k-anonymity and sequence hiding methods and reformulating the problem as that of hiding k-infrequent sequences 5

Running example: k = 2 Dataset D B C A B C D B C E B C D Dataset D’ B C A B C D B C A B C D Root B:3 C:3 E:1 A:2 B:2 C:2 D : 2 D:1 Prefix Tree Construction Tree Pruning Tree Reconstruction Generation of D’ LCS: 1. B C 2. B C D Root B:3 C:3 E:1 A:2 B:2 C:2 D : 2 D:1 L cut B C E : 1 B C D : 1 Root B:1 C:1 A:2 B:2 C:2 D : 2 Root B:2 C:2 A:3 B:3 C:3 D : 3 Root B:2 C:2 A:2 B:2 C:2 D:2 6