Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differential Privacy: Theoretical & Practical Challenges Salil Vadhan Center for Research on Computation & Society John A. Paulson School of Engineering.

Similar presentations


Presentation on theme: "Differential Privacy: Theoretical & Practical Challenges Salil Vadhan Center for Research on Computation & Society John A. Paulson School of Engineering."— Presentation transcript:

1 Differential Privacy: Theoretical & Practical Challenges Salil Vadhan Center for Research on Computation & Society John A. Paulson School of Engineering & Applied Sciences Harvard University on sabbatical at Shing-Tung Yau Center Department of Applied Mathematics National Chiao-Tung University Lecture at Institute of Information Science, Academia Sinica November 9, 2015

2 Data Privacy: The Problem Given a dataset with sensitive information, such as: Census data Health records Social network activity Telecommunications data How can we: enable “desirable uses” of the data while protecting the “privacy” of the data subjects? Academic research Informing policy Identifying subjects for drug trial Searching for terrorists Market analysis … Academic research Informing policy Identifying subjects for drug trial Searching for terrorists Market analysis … ????

3 NameSexBloodHIV? ChenFBY JonesMAN SmithMON RossMOY LuFAN ShahMBY Approach 1: Encrypt the Data Problems? NameSexBloodHIV? 100101001001110101110111 101010111010111111001001 001010100100011001110101 001110010010110101100001 110101000000111001010010 111110110010000101110101

4 NameSexBloodHIV? ChenFBY JonesMAN SmithMON RossMOY LuFAN ShahMBY Approach 2: Anonymize the Data “re-identification” often easy [Sweeney `97] Problems?

5 Approach 3: Mediate Access C C trusted “curator” q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 data analysts Problems? NameSexBloodHIV? ChenFBY JonesMAN SmithMON RossMOY LuFAN ShahMBY Even simple “aggregate” statistics can reveal individual info. [Dinur-Nissim `03, Homer et al. `08, Mukatran et al. `11, Dwork et al. `15]

6 Privacy Models from Theoretical CS ModelUtilityPrivacyWho Holds Data? Differential Privacystatistical analysis of dataset individual-specific info trusted curator Secure Function Evaluation any query desiredeverything other than result of query original users (or semi-trusted delegates) Fully Homomorphic (or Functional) Encryption any query desiredeverything (except possibly result of query) untrusted server

7 Differential privacy C C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 data analysts Requirement: effect of each individual should be “hidden” [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] SexBloodHIV? FBY MAN MON MOY FAN MBY

8 Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY MAN MON MOY FAN MBY adversary

9 Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY MAN MON MOY FAN MBY Requirement: an adversary shouldn’t be able to tell if any one person’s data were changed arbitrarily adversary

10 Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY MON MOY FAN MBY Requirement: an adversary shouldn’t be able to tell if any one person’s data were changed arbitrarily adversary

11 Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY FAY MON MOY FAN MBY Requirement: an adversary shouldn’t be able to tell if any one person’s data were changed arbitrarily adversary

12 Simple approach: random noise C “What fraction of people are type B and HIV positive?” SexBloodHIV? FBY MAN MON MOY FAN MBY M

13 Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C randomized curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY FAY MON MOY FAN MBY adversary

14 Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C randomized curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY FAY MON MOY FAN MBY adversary

15 Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C randomized curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY FAY MON MOY FAN MBY adversary

16 Simple approach: random noise C “What fraction of people are type B and HIV positive?” SexBloodHIV? FBY MAN MON MOY FAN MBY C

17 Answering multiple queries C “What fraction of people are type B and HIV positive?” SexBloodHIV? FBY MAN MON MOY FAN MBY C

18 Answering multiple queries C “What fraction of people are type B and HIV positive?” SexBloodHIV? FBY MAN MON MOY FAN MBY C

19 Some Differentially Private Algorithms histograms [DMNS06] contingency tables [BCDKMT07, GHRU11, TUV12, DNT14], machine learning [BDMN05,KLNRS08], regression & statistical estimation [CMS11,S11,KST11,ST12,JT13 ] clustering [BDMN05,NRS07] social network analysis [HLMJ09,GRU11,KRSY11,KNRS13,BBDS13] approximation algorithms [GLMRT10] singular value decomposition [HR12, HR13, KT13, DTTZ14] streaming algorithms [DNRY10,DNPR10,MMNW11] mechanism design [MT07,NST10,X11,NOS12,CCKMV12,HK12,KPRU12] … See Simons Institute Workshop on Big Data & Differential Privacy 12/13Simons Institute Workshop on Big Data & Differential Privacy 12/13

20 Differential Privacy: Interpretations Whatever an adversary learns about me, it could have learned from everyone else’s data. Mechanism cannot leak “individual-specific” information. Above interpretations hold regardless of adversary’s auxiliary information. Composes gracefully (k repetitions ) k  differentially private) But No protection for information that is not localized to a few rows. No guarantee that subjects won’t be “harmed” by results of analysis.

21 Answering multiple queries C “What fraction of people are type B and HIV positive?” SexBloodHIV? FBY MAN MON MOY FAN MBY C

22 Amazing possibility: synthetic data Utility: preserves fraction of people with every set of attributes! “fake” people [Blum-Ligett-Roth ’08, Hardt-Rothblum `10] C C SexBloodHIV? FBY MAN MON MOY FAN MBY SexBloodHIV? MBN FBY MOY FAN FON

23 Our result: synthetic data is hard “fake” people [Dwork-Naor-Reingold-Rothblum-V. `09, Ullman-V. `11] Theorem: Any such C requires exponential computing time (else we could break all of cryptography). C C SexBloodHIV? FBY MAN MON MOY FAN MBY SexBloodHIV? MBN FBY MOY FAN FON

24 Contains the same statistical information as synthetic data But can be computed in sub-exponential time! rich summary C C SexBloodHIV? FBY MAN MON MOY FAN MBY Our result: alternative summaries [Thaler-Ullman-V. `12, Chandrasekaran-Thaler-Ullman-Wan `13] Open: is there a polynomial-time algorithm?

25 Amazing Possibility II: Statistical Inference & Machine Learning C SexBloodHIV? FBY MAN MON MOY FAN MBY

26 DP Theory & Practice Theory: differential privacy research has many intriguing theoretical challenges rich connections w/other parts of CS theory & mathematics e.g. cryptography, learning theory, game theory & mechanism design, convex geometry, pseudorandomness, optimization, approximability, communication complexity, statistics, … Practice: interest from many communities in seeing whether DP can be brought to practice e.g. statistics, databases, medical informatics, privacy law, social science, computer security, programming languages, …

27 Challenges for DP in Practice

28 Some Efforts to Bring DP to Practice CMU-Cornell-PennState “Integrating Statistical and Computational Approaches to Privacy” (See http://onthemap.ces.census.gov/) Google “RAPPOR" UCSD “Integrating Data for Analysis, Anonymization, and Sharing” (iDash) UT Austin “Airavat: Security & Privacy for MapReduce” UPenn “Putting Differential Privacy to Work” Stanford-Berkeley-Microsoft “Towards Practicing Privacy” Duke-NISSS “Triangle Census Research Network” Harvard “Privacy Tools for Sharing Research Data” MIT/CSAIL/ALFA "MoocDB Privacy tools for Sharing MOOC data" …

29 Computer Science, Law, Social Science, Statistics Privacy tools for sharing research data http://privacytools.seas.harvard.edu/ Any opinions, findings, and conclusions or recommendations expressed here are those of the author(s) and do not necessarily reflect the views of the funders of the work. A SaTC Frontier project

30 30 Target: Data Repositories

31 Datasets are restricted due to privacy concerns Goal: enable wider sharing while protecting privacy

32 Challenges for Sharing Sensitive Data Complexity of Law Thousands of privacy laws in the US alone, at federal, state and local level, usually context-specific: HIPAA, FERPA, CIPSEA, Privacy Act, PPRA, ESRA, …. Difficulty of Deidentification Stripping “PII” usually provides weak protections and/or poor utility Inefficient Process for Obtaining Restricted Data Can involve months of negotiation between institutions, original researchers Goal: make sharing easier for researcher without expertise in privacy law/cs/stats Sweeney `97

33 Vision: Integrated Privacy Tools Risk Assessment and De-Identification Risk Assessment and De-Identification Differential Privacy Customized & Machine- Actionable Terms of Use Customized & Machine- Actionable Terms of Use Data Tag Generator Data Set Query Access Restricted Access Tools to be developed during project Consent from subjects Open Access to Sanitized Data Set IRB proposal & review Policy Proposals and Best Practices Database of Privacy Laws & Regulations Deposit in repository

34 Already can run many statistical analyses (“Zelig methods”) through the Dataverse interface, without downloading data.

35 A new interactive data exploration & analysis tool to in Dataverse 4.0. Plan: use differential privacy to enable access to currently restricted datasets

36 Goals for our DP Tools General-purpose: applicable to most datasets uploaded to Dataverse. Automated: no differential privacy expert optimizing algorithms for a particular dataset or application Tiered access: DP interface for wide access to rough statistical information, helping users decide whether to apply for access to raw data (cf. Census PUMS vs RDCs) (Limited) prototype on project website: http://privacytools.seas.harvard.edu/

37 Differential Privacy: Summary

38 Privacy Models from Theoretical CS ModelUtilityPrivacyWho Holds Data? Differential Privacystatistical analysis of dataset individual-specific info trusted curator Secure Function Evaluation any query desiredeverything other than result of query original users (or semi-trusted delegates) Fully Homomorphic (or Functional) Encryption any query desiredeverything (except possibly result of query) untrusted server See Shafi Goldwasser’s talk at White House-MIT Big Data Privacy Workshop 3/3/14


Download ppt "Differential Privacy: Theoretical & Practical Challenges Salil Vadhan Center for Research on Computation & Society John A. Paulson School of Engineering."

Similar presentations


Ads by Google