Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

Slides:



Advertisements
Similar presentations
PRIVACY IN NETWORK TRACES Ilya Mironov Microsoft Research (Silicon Valley Campus)
Advertisements

Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip and Nageswara S. V. Rao.
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
The Changing Landscape of Privacy in a Big Data World Privacy in a Big Data World A Symposium of the Board on Research Data and Information September 23,
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
The End of Anonymity Vitaly Shmatikov. Tastes and Purchases slide 2.
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
Cryptography. 2 Objectives Explain common terms used in the field of cryptography Outline what mechanisms constitute a strong cryptosystem Demonstrate.
Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.
Privacy Enhancing Technologies
Lecturer: Moni Naor Joint work with Cynthia Dwork Foundations of Privacy Informal Lecture Impossibility of Disclosure Prevention or The Case for Differential.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
Preserving Privacy in Clickstreams Isabelle Stanton.
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
Private Analysis of Graphs
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Differential Privacy Tutorial Part 1: Motivating the Definition Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Data Privacy Vitaly Shmatikov CS slide 2 uHealth-care datasets Clinical studies, hospital discharge databases … uGenetic datasets $1000 genome,
Slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Microdata masking as permutation Krish Muralidhar Price College of Business University of Oklahoma Josep Domingo-Ferrer UNESCO Chair in Data Privacy Dept.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Privacy-safe Data Sharing. Why Share Data? Hospitals share data with researchers – Learn about disease causes, promising treatments, correlations between.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Slide 1 CS 380S Differential Privacy Vitaly Shmatikov most slides from Adam Smith (Penn State)
University of Texas at El Paso
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Current Developments in Differential Privacy
CMSC 414 Computer and Network Security Lecture 3
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Differential Privacy.
Presentation transcript:

Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A A AAA

Meaningful Privacy Guarantees  Statistical databases  Medical  Government Agency  Social Science  Searching / click stream  Learn non-trivial trends while protecting privacy of individuals and fine-grained structure

Linkage Attacks  Using “innocuous” data in one dataset to identify a record in a different dataset containing both innocuous and sensitive data  At the heart of the voluminous research on hiding small cell counts in tabular data

The Netflix Prize  Netflix Recommends Movies to its Subscribers  Offers $1,000,000 for 10% improvement in its recommendation system  Not concerned here with how this is measured  Publishes training data  Nearly 500,000 records, 18,000 movie titles  “The ratings are on a scale from 1 to 5 (integral) stars. To protect customer privacy, all personal information identifying individual customers has been removed and all customer ids have been replaced by randomly-assigned ids. The date of each rating and the title and year of release for each movie are provided.”  Some ratings not sensitive, some may be sensitive  OK for Netflix to know, not OK for public to know

A Publicly Available Set of Movie Rankings  International Movie Database (IMDb)  Individuals may register for an account and rate movies  Need not be anonymous  Visible material includes ratings, dates, comments  By definition, these ratings not sensitive

The Fiction of Non-PII Narayanan & Shmatikov 2006  Movie ratings and dates are PII  “With 8 movie ratings (of which we allow 2 to be completely wrong) and dates that may have a 3-day error, 96% of Netflix subscribers whose records have been released can be uniquely identified in the dataset.”  Linkage attack prosecuted using the IMDb.  Link ratings in IMDb to (non-sensitive) ratings in Netflix, revealing sensitive ratings in Netflix  NS draw conclusions about user.  May be wrong, may be right. User harmed either way.

What Went Wrong?  What is “Personally Identifiable Information”?  Typically syntactic, not semantic  Eg, genome sequence not considered PII ??  Suppressing “PII” doesn’t rule out linkage attacks  Famously observed by Sweeney, circa 1998  AOL debacle  Need a more semantic approach to privacy

Semantic Security Against an Eavesdropper Goldwasser &Micali 1982  Vocabulary  Plaintext: the message to be transmitted  Ciphertext: the encryption of the plaintext  Auxiliary information: anything else known to attacker  The ciphertext leaks no information about the plaintext.  Formalization Compare the ability of someone seeing aux and ciphertext to guess (anything about) the plaintext, to the ability of someone seeing only aux to do the same thing. Difference should be “tiny”.

10 Semantic Security for Statistical Databases?  Dalenius, 1977  Anything that can be learned about a respondent from the statistical database can be learned without access to the database  An ad omnia guarantee  Happily, Formalizes to Semantic Security  Recall: Anything about the plaintext that can be learned from the ciphertext can be learned without the ciphertext  Popular Intuition: prior and posterior views about an individual shouldn’t change “too much”.  Clearly Silly  My (incorrect) prior is that everyone has 2 left feet.  Very popular in literature nevertheless  Definitional awkwardness even when used correctly

11 Semantic Security for Statistical Databases?  Unhappily, Unachievable  Can’t achieve cryptographically small levels of “tiny”  Intuition: (adversarial) user is supposed to learn unpredictable things about the DB; translates to learning more than a cryptographically tiny amount about a respondent  Relax “tiny”?

Relaxed Semantic Security for Statistical Databases?  Relaxing Tininess Doesn’t Help  Dwork & Naor 2006  Database teaches average heights of population subgroups  “Terry Gross is two inches shorter than avg Lithuanian ♀ ”  Access to DB teaches Terry’s height  Terry’s height learnable from the DB, not learnable otherwise  Formal proof extends to essentially any notion of privacy compromise, uses extracted randomness from the SDB as a one- time pad.  Bad news for k-,l-,m- etc.  Attack Works Even if Terry Not in DB!  Suggests new notion of privacy: risk incurred by joining DB  “Differential Privacy”  Privacy, when existence of DB is stipulated  Before/After interacting vs Risk when in/notin DB

13 Differential Privacy K gives  -differential privacy if for all values of DB and Me and all transcripts t: Pr [t] Pr[ K (DB - Me) = t] Pr[ K (DB + Me) = t] ≤ e  ¼ 1 § 

14 Differential Privacy is an Ad Omnia Guarantee  No perceptible risk is incurred by joining DB.  Anything adversary can do to me, it could do without Me (my data). Bad Responses: XXX Pr [response]

15 An Interactive Sanitizer: K Dwork, McSherry, Nissim, Smith 2006 ? f f: DB  R K (f, DB) = f(DB) + Noise Eg, Count(P, DB) = # rows in DB with Property P + noise K

16 Sensitivity of a Function f How Much Can f(DB + Me) Exceed f(DB - Me)? Recall: K (f, DB) = f(DB) + noise Question Asks: What difference must noise obscure?  f = max DB, Me |f(DB+Me) – f(DB-Me)| eg,  Count = 1

17 Calibrate Noise to Sensitivity  f = max DB, Me |f(DB+Me) – f(DB-Me)| Theorem: To achieve  -differential privacy, use scaled symmetric noise Lap(|x|/R) with R =  f/  Pr[x] proportional to exp(-|x|/R) Increasing R flattens curve; more privacy Noise depends on f and  not on the database 0 R2R3R4R5R-R-2R-3R-4R

18 Calibrate Noise to Sensitivity  f = max DB, Me |f(DB+Me) – f(DB-Me)| 0 R2R3R4R5R-R-2R-3R-4R Pr[ K (f, DB - Me) = t] Pr[ K (f, DB + Me) = t] = exp(-(|t- f - |-|t- f + |)/R) ≤ exp(-  f/R) Theorem: To achieve  -differential privacy, use scaled symmetric noise Lap(|x|/R) with R =  f/ 

19 Multiple Queries For query sequence f 1, …, f d  -privacy achieved with noise generation parameter  R i =   f i /  for each response. Can sometimes do better. Noise must increase with the sensitivity of the query sequence. Naively, more queries means noisier answers Dinur and Nissim 2003 et sequelae Speaks to the Non-Interactive Setting  Any non-interactive solution permitting “too accurate” answers to “too many” questions is vulnerable to attack.  Privacy mechanism is at an even greater disadvantage than in the interactive case; can be exploited

Future Work  Investigate Techniques from Robust Statistics  Area of statistics devoted to coping with  Small amounts of wild data entry errors  Rounding errors  Limited dependence among samples  Problem: the statistical setting makes strong assumptions about existence and nature of an underlying distribution  Differential Privacy for Social Networks, Graphs  What are the utility questions of interest?  Definitional and Algorithmic Work for Other Settings  “Differential” approach more broadly useful  Several results discussed in next few hours  Porous Boundary Between “Inside” and “Outside”?  Outsourcing, bug reporting, combating D-DoS attacks and terror

Privacy is a natural resource. It’s non-renewable, and it’s not yours. Conserve it.