1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.

1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty

2 Motivation: Inherent tension in mining sensitive databases: We want to release aggregate information about the data, without leaking individual information about participants. Aggregate info: Number of A students in a school district. Individual info: If a particular student is an A student. Problem: Exact aggregate info may leak individual info. Eg: Number of A students in district, and Number of A students in district not named Dan Waymel Goal: Method to protect individual info, release aggregate info.

3 A growing number of data mining applications need to deal with data sources that are distributed, possibly proprietary, and sensitive to privacy. Financial transactions, health-care records, and network communication traffic are a few examples. Privacy is also becoming an increasingly important issue in data mining applications for counter-terrorism and homeland defense that may require creating profiles, constructing social network models, detecting terrorist communications from distributed privacy sensitive multi-party data. Combining such diverse data sets belonging to different parties may violate the privacy laws. Therefore we need algorithms that can mine the data while guaranteeing that the privacy of the data is not compromised. This has resulted in the development of several privacy- preserving data mining techniques. Many of these techniques work using randomized techniques to perturb the data and preserve the data privacy while still guaranteeing the invariance of the underlying patterns.

4 Goal: Distort data while still preserve some properties for data mining propose. − Additive Based − Multiplicative Based − Condensation based − Decomposition − Data Swapping

5 Randomization approach Hide the original data by randomly modifying the data values using some additive noise still preserving the patterns of the original data (preserving the underlying probabilistic properties) Reconstruct the distribution of original data values from the perturbed data. Cannot reconstruct original values A decision tree classifier is built from the perturbed data from this reconstructed distribution. Privacy breaches Cryptographic approach – Party X –owns Database D1, Party Y –owns Database D2 Build a decision tree built on D1 and D2 without revealing information about D1 to party Y and about D2 to party X except what might be revealed from the decision tree. Horizontally partitioned data - Records (entities) split across parties Vertically partitioned data - Attributes split across parties

7 Agrawal R., Srikant R. Privacy-Preserving Data Mining. ACM SIGMOD Conference, 2000. “Random Data Perturbation Techniques and Privacy Preserving Data Mining”–Hillol Kargupta, SouptikGupta, QiWang, Krishnamoorthy Sivakumar C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Zhu, Tools for Privacy Preserving Distributed Data Mining, ACM SIGKDD Explorations 4(2), January 2003. Privacy Preserving Cooperative Statistical Analysis – WenliangDu, MikhailJ. Atallah Defining Privacy for Data Mining –Chris Clifton, MuratKantarcioglu, JaideepVaidya Data Mining : Concepts and Techniques –JiaweiHan, MichelineKamber

8 Privacy is a personal choice, so should enable individual adaptable (Liu, Kantarcioglu and Thuraisingham ICDM’06)

1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.

Similar presentations

Presentation on theme: "1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.

Similar presentations

Presentation on theme: "1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty."— Presentation transcript:

Similar presentations

About project

Feedback