Differential Privacy (1). Outline  Background  Definition.

Slides:



Advertisements
Similar presentations
Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip and Nageswara S. V. Rao.
Advertisements

I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
ITIS 6200/ Secure multiparty computation – Alice has x, Bob has y, we want to calculate f(x, y) without disclosing the values – We can only do.
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
1 Privacy Preserving Data Publishing Prof. Ravi Sandhu Executive Director and Endowed Chair March 29, © Ravi.
Privacy Enhancing Technologies
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Practical Techniques for Searches on Encrypted Data Author: Dawn Xiaodong Song, David Wagner, Adrian Perrig Presenter: 紀銘偉.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
1 Pinning Down “Privacy” Defining Privacy in Statistical Databases Adam Smith Weizmann Institute of Science
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Calibrating Noise to Sensitivity in Private Data Analysis
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Preserving Privacy in Clickstreams Isabelle Stanton.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
On the Anonymity of Anonymity Systems Andrei Serjantov (anonymous)
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
A Few Simple Applications to Cryptography Louis Salvail BRICS, Aarhus University.
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
CS573 Data Privacy and Security Statistical Databases
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
Slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
Additive Data Perturbation: the Basic Problem and Techniques.
Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
Editing of linked micro files for statistics and research.
15-499Page :Algorithms and Applications Cryptography I – Introduction – Terminology – Some primitives – Some protocols.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Privacy-preserving data publishing
Microdata masking as permutation Krish Muralidhar Price College of Business University of Oklahoma Josep Domingo-Ferrer UNESCO Chair in Data Privacy Dept.
Ruihao Zhu and Kang G. Shin
An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
Unraveling an old cloak: k-anonymity for location privacy
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Slide 1 CS 380S Differential Privacy Vitaly Shmatikov most slides from Adam Smith (Penn State)
University of Texas at El Paso
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Inference and Flow Control
Differential Privacy (2)
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Differential Privacy.
Presentation transcript:

Differential Privacy (1)

Outline  Background  Definition

Background  Interactive database query A classical research problem for statistical databases Prevent query inferences – malicious users submit multiple queries to infer private information about some person Has been studied since decades ago  Non-interactive publishing statistics then destroy data micro-data publishing

4 Background: Database Privacy You Bob Alice Users (government, researchers, marketers, … ) “Census problem” Two conflicting goals  Utility: Users can extract “global” statistics  Privacy: Individual information stays hidden  How can these be formalized? Collection and “ sanitization ” 

5 Database Privacy Variations on model studied in  Statistics  Data mining  Theoretical CS  Cryptography Different traditions for what “privacy” means

Two types of privacy protection methods  Data sanitization  Anonymization

Sanitization approaches  Input perturbation Add noise to data Generalize data  Summary statistics Means, variances Marginal totals Model parameters  Output perturbation Add noise to summary statistics

Blending/hiding into a crowd  K-anonymity based approaches  Adversary may have various background knowledge to breach privacy  Privacy models often assume “the adversary’s background knowledge is given”

Classic intuition for privacy  Privacy means that anything can be learned about a respondent from the statistical database can be learned without access to the database A very strong definition Defined by T. Dalenius, 1977  Equivalent to security of encryption Anything about the plaintext that can be learned from a ciphertext can be learned without the ciphertext.

10 Impossibility result The Dalenius definition cannot be achieved. Example: If I know Alice’s height is 2 inches higher than the average American’s height, by looking at the census database, I can find the average and then calculate Alice’s exact height. Therefore, Alice’s privacy is breached. We need to revise the privacy definiton… Remove Gavison def?

Differential Privacy The risk to my privacy should not substantially increase as a result of participating in a statistical database. With or without including me in the database, my privacy risk should not change much (In contrast, the Dalenius definition requires that using the database will not increase my privacy risk, including the case that the database does not even include my record).

Definition Mechanism: K(x) = f(x) + D, D is some noise. It is an output perturbation method.

Sensitivity function  Captures how great a difference must be hidden by the additive noise How to design the noise D? It is actually linked back to the function f(x)

LAP distribution noise Using laplacian distribution to generate noise.

Similar to Guassian noise

Adding LAP noise Why does this work?

Proof sketch Let K(x) = f(x) + D =r. Thus, r-f(x) has Lap distribution with the scale df/e. Similarly, K(x’) = f(x’)+D=r, and r-f(x’) has the same distribution P(K(x) = r) = exp(-|f(x)-r|(e/df)) P(K(x’)= r) = exp(-|f(x’)-r|(e/df)) P(K(x)=r)/P(K(x’)=r) = exp( (|f(x’)-r|-|f(x)-r|)(e/df)) apply triangle inequality <= exp( |f(x’)-f(x)|(e/df)) = exp(e)

Delta_f=1, epsilon varies Noise samples

Delta_f=1 epsilon=0.01

Delta_f=1 epsilon=0.1

Delta_f=1 epsilon=1

Delta_f=1 epsilon=2

Delta_f=1 epsilon=10

Delta_f=2, epsilon varies

Delta_f=3, epsilon varies

Delta_f=10000, epsilon varies

Composition (in PINQ paper)  Sequential composition  Parallel composition --for disjoint sets, the ultimate privacy guarantee depends only on the worst of the guarantees of each analysis, not the sum.