Privacy Enhancing Technologies

Slides:



Advertisements
Similar presentations
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Advertisements

Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
An brief tour of Differential Privacy Avrim Blum Computer Science Dept Your guide:
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Calibrating Noise to Sensitivity in Private Data Analysis
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
Private Analysis of Graphs
Using Data Privacy for Better Adaptive Predictions Vitaly Feldman IBM Research – Almaden Foundations of Learning Theory, 2014 Cynthia Dwork Moritz Hardt.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 16 10/11/2011 Security and Privacy in Cloud Computing.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
CS573 Data Privacy and Security Statistical Databases
JSM, Boston, August 8, 2014 Privacy, Big Data and The Public Good: Statistical Framework Stefan Bender (IAB)
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),
Slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Implementing Differential Privacy & Side-channel attacks CompSci Instructor: Ashwin Machanavajjhala 1Lecture 14 : Fall 12.
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
Additive Data Perturbation: the Basic Problem and Techniques.
Data Perturbation An Inference Control Method for Database Security Dissertation Defense Bob Nielson Oct 23, 2009.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Privacy-preserving data publishing
A Whirlwind Tour of Differential Privacy
An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
Auditing Information Leakage for Distance Metrics Yikan Chen David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
Database Privacy (ongoing work) Shuchi Chawla, Cynthia Dwork, Adam Smith, Larry Stockmeyer, Hoeteck Wee.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Slide 1 CS 380S Differential Privacy Vitaly Shmatikov most slides from Adam Smith (Penn State)
University of Texas at El Paso
Private Data Management with Verification
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Current Developments in Differential Privacy
Differential Privacy and Statistical Inference: A TCS Perspective
Differential Privacy (2)
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
Gentle Measurement of Quantum States and Differential Privacy *
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Differential Privacy.
Presentation transcript:

Privacy Enhancing Technologies Lecture 3 Differential Privacy Elaine Shi Some slides adapted from Adam Smith’s lecture and other talk slides

Roadmap Defining Differential Privacy Techniques for Achieving DP Output perturbation Input perturbation Perturbation of intermediate values Sample and aggregate

General Setting Medical data Query logs Social network data … Data mining Statistical queries

General Setting publish Data mining Statistical queries

How can you allow meaningful usage of such datasets while preserving individual privacy?

Blatant Non-Privacy

Blatant Non-Privacy Leak individual records Can link with public databases to re-identify individuals Allow adversary to reconstruct database with significant probablity

Attempt 1: Crypto-ish Definitions I am releasing some useful statistic f(D), and nothing more will be revealed. What kind of statistics are safe to publish?

How do you define privacy?

Attempt 2: I am releasing researching findings showing that people who smoke are very likely to get cancer. You cannot do that, since it will break my privacy. My insurance company happens to know that I am a smoker…

Attempt 2: Absolute Disclosure Prevention “If the release of statistics S makes it possible to determine the value [of private information] more accurately than is possible without access to S, a disclosure has taken place.” [Dalenius]

An Impossibility Result [informal] It is not possible to design any non-trivial mechanism that satisfies such strong notion of privacy.[Dalenius]

Attempt 3: “Blending into Crowd” or k-Anonymity K people purchased A and B, and all of them also purchased C.

Attempt 3: “Blending into Crowd” or k-Anonymity K people purchased A and B, and all of them also purchased C. I know that Elaine bought A and B…

Attempt 4: Differential Privacy From the released statistics, it is hard to tell which case it is.

Attempt 4: Differential Privacy For all neighboring databases x and x’ For all subsets of transcripts: Pr[A(x) є S] ≤ eε Pr[A(x’) є S]

Attempt 4: Differential Privacy Please don’t blame me if your insurance company knows that you are a smoker, since I am doing the society a favor. I am releasing researching findings showing that people who smoke are very likely to get cancer. 1 2 Oh, btw, please feel safe to participate in my survey, since you have nothing more to lose. Since my mechanism is DP, whether or not you participate, your privacy loss would be roughly the same! 3 4

Notable Properties of DP Adversary knows arbitrary auxiliary information No linkage attacks Oblivious to data distribution Sanitizer need not know the adversary’s prior distribution on the DB

Notable Properties of DP  

DP Techniques

Techniques for Achieving DP Output perturbation Input perturbation Perturbation of intermediate values Sample and aggregate

Method1: Output Perturbation   x,x’ neighbors

Method1: Output Perturbation   Theorem: A(x) = f(x) + Lap() is -DP Intuition: add more noise when function is sensitive

Method1: Output Perturbation   A(x) = f(x) + Lap() is -DP

Examples of Low Global Sensitivity Average Histograms and contingency tables Covariance matrix [BDMN] Many data-mining algorithms can be implemented through a sequence of low-sensitivity queries Perceptron, some EM algorithms, SQ learning algorithms

Examples of High Global Sensitivity Order statistics Clustering

PINQ

PINQ Language for writing differentially-private data analyses Language extension to .NET framework Provides a SQL-like interface for querying data Goal: Hopefully, non-privacy experts can perform privacy-preserving data analytics

Scenario Query through PINQ interface Trusted curator Data analyst

Example 1

Example 2: K-Means

Example 3: K-Means with Partition Operation

Partition O1 O2 Ok O1 O2 Ok     … … P1 P2 Pk P1 P2 Pk    

Composition and privacy budget Sequential composition Parallel composition

K-Means: Privacy Budget Allocation  

Privacy Budget Allocation Allocation between users/computation providers Auction? Allocation between tasks In-task allocation Between iterations Between multiple statistics Optimization problem No satisfactory solution yet!

When Budget Has Exhausted ?

Transformations Where Select GroupBy Join

Method 2: Input Perturbation Randomized response [Warner65] Please analyze this method in homework

Method 3: Perturb Intermediate Results

Continual Setting

Perturbation of Outputs, Inputs, and Intermediate Results  

Comparison Method Error Output perturbation Input perturbation Perturbation of Intermediate results

Binary Tree Technique [1, 8] [1, 4] [5, 8] [1, 2] 1 2 3 4 5 6 7 8

Binary Tree Technique [1, 8] [1, 4] [5, 8] [1, 2] 1 2 3 4 5 6 7 8

Key Observation Each output is the sum of O(log T) partial sums Each input appears in O(log T) partial sums

Method 4: Sample and Aggregate Data dependent techniques

Examples of High Global Sensitivity

Examples of High Global Sensitivity

Sample and Aggregate [NRS07, Smith11]

Sample and Aggregate Theorem:   Theorem: The sample and aggregate algorithm preserves -DP, and converges to the “true value” when the statistic f is asymptotically normal on a database consisting of i.i.d. values.

“Asymptotically Normal” CLT: sum of h(xi) where h(Xi) has finite expectation and variance Common maximum likelihood estimators Estimators for common regression problems …

DP Pros, Cons, and Challenges? Utility v.s. privacy Privacy budget management and depletion Allow non-experts to use? Many non-trivial DP algorithms require really large datasets to be practically useful What privacy budget is reasonable for a dataset? Implicit independence assumption? Consider replicating a DB k times

Other Notions Noiseless privacy Crowd-blending privacy

Homework If I randomly sample one record from a large database consisting of many records, and publish that record, would this be differentially private? Prove or disprove this. (If you cannot give a formal proof, say why or why not). Suppose I have a very large database (e.g., containing ages of all people living in Maryland), and I publish the average age of all people in the database. Intuitively, do you think this preserves users' privacy? Is this differentially private? Prove or disprove this. (If you cannot give a formal proof, say why or why not). What do you think are the pros and cons of differential privacy? Anlyze Input Perturbation(Second techniques for achieving DP)

Reading list Cynthia Dwork's video tutoial on DP [Cynthia 06] Differential Privacy (Invited talk at ICALP 2006) [Frank 09] Privacy Integrated Queries [Mohan et. al. 12] GUPT: Privacy Preserving Data Analysis Made Easy [Cynthia Dwork 09] The Differential Privacy Frontier