Current Developments in Differential Privacy

Slides:



Advertisements
Similar presentations
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Advertisements

Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
Privacy Enhancing Technologies
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
© Heikki Topi Computing and University Education in Analytics ACM Education Council San Francisco, CA November 2, 2013 Heikki Topi, Bentley University.
Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard.
Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October Chicago Geoffrey Fox
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
Using Data Privacy for Better Adaptive Predictions Vitaly Feldman IBM Research – Almaden Foundations of Learning Theory, 2014 Cynthia Dwork Moritz Hardt.
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.
26th WATCH: Differential Privacy: Theoretical and Practical Challenges Salil Vadhan Harvard University THURSDAY Jan. 15, Noon, Room 110 W ashington A rea.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
CS573 Data Privacy and Security Statistical Databases
Introduction to: 1.  Goal[DEN83]:  Provide frequency, average, other statistics of persons  Challenge:  Preserving privacy[DEN83]  Interaction between.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Differential Privacy: Theoretical & Practical Challenges Salil Vadhan Center for Research on Computation & Society John A. Paulson School of Engineering.
A Whirlwind Tour of Differential Privacy
PRIVACY TOOLS FOR SHARING RESEARCH DATA NSF site visit October 19, 2015 Salil Vadhan Supported by the NSF Secure & Trustworthy Cyberspace (SaTC) program,
Transition to Practice. We define “Transition to Practice” as making privacy tools and systems operational.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Space for things we might want to put at the bottom of each slide. Part 6: Open Problems 1 Marianne Winslett 1,3, Xiaokui Xiao 2, Yin Yang 3, Zhenjie Zhang.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
Book web site:
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
Security in Outsourcing of Association Rule Mining
Private Data Management with Verification
Differentially Private Verification of Regression Model Results
Privacy TOOLS FOR SHARING RESEARCH DATA
TRUST Area 3 Overview: Privacy, Usability, & Social Impact
Intro to Machine Learning
Understanding Generalization in Adaptive Data Analysis
Sampling of min-entropy relative to quantum knowledge Robert König in collaboration with Renato Renner TexPoint fonts used in EMF. Read the TexPoint.
Privacy-preserving Release of Statistics: Differential Privacy
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
563.10: Bloom Cookies Web Search Personalization without User Tracking
Introduction to Data Programming
Sophia Lafferty-hess | research data manager
StreamApprox Approximate Stream Analytics in Apache Flink
Privacy-Preserving Classification
Differential Privacy in Practice
Vitaly (the West Coast) Feldman
CS 154, Lecture 6: Communication Complexity
StreamApprox Approximate Stream Analytics in Apache Spark
StreamApprox Approximate Computing for Stream Analytics
Differential Privacy and Statistical Inference: A TCS Perspective
Database Systems Chapter 1
Data Warehousing and Data Mining
CASE − Cognitive Agents for Social Environments
Privacy-preserving Prediction
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
FUTURE PLANS NSF site visit October 19, 2015 Salil Vadhan
FUTURE PLANS NSF site visit October 19, 2015 Salil Vadhan
Scott Aaronson (UT Austin) UNM, Albuquerque, October 18, 2018
18734: Foundations of Privacy
Some contents are borrowed from Adam Smith’s slides
OpenDP: A Pitch for a Community Effort
Differential Privacy (1)
Jerome Reiter Department of Statistical Science Duke University
Presentation transcript:

Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

Differential Privacy: Recap q1 a1 q2 C 𝑛 a2 q3 a3 Database D D‘ curator data analysts Def [DMNS06]: A randomized algorithm C is -differentially private iff for all databases D, D’ that differ on one row, and all query sequences q1,…,qt  sets T Rt, Pr[C(D,q1,…,qt) T]  e  Pr[C(D’,q1,…,qt)T] + d  (1+)  Pr[C(D’,q1,…,qt)T]  small constant, e.g.  = .01, d cryptographically small, e.g. d = 2-60 Distribution of C(D,q1,…,qt) ≈ 𝜀 Distribution of C(D’,q1,…,qt) “My data has little influence on what the analysts see”

Differential Privacy: Recap q1 a1 q2 C 𝑛 a2 q3 a3 Database D D‘ curator data analysts Def [DMNS06]: A randomized algorithm C is -differentially private iff for all databases D, D’ that differ on one row, all query sequences q1,…,qt, and all sets T Rt, Pr[C(D,q1,…,qt)T] ≲ (1+)  Pr[C(D’,q1,…,qt)T]  small constant, e.g.  = .01

Differential Privacy: Key Points q1 a1 q2 C 𝑛 a2 q3 a3 Database D D‘ Distribution of C(D,q1,…,qt) ≈ 𝜀 Distribution of C(D’,q1,…,qt) Idea: inject random noise to obscure effect of each individual Not necessarily by adding noise to answer! Good for Big Data: more utility and more privacy as 𝑛→∞.

Differential Privacy: Key Points q1 a1 q2 C 𝑛 a2 q3 a3 Database D D‘ Distribution of C(D,q1,…,qt) ≈ 𝜀 Distribution of C(D’,q1,…,qt) Strong guarantees: for all databases, regardless of adversary’s auxiliary knowledge Scalable: don’t require privacy expert in the loop for each database or release

Some Differentially Private Algorithms histograms [DMNS06] contingency tables [BCDKMT07, GHRU11, TUV12, DNT14], machine learning [BDMN05,KLNRS08], regression & statistical estimation [CMS11,S11,KST11,ST12,JT13] clustering [BDMN05,NRS07] social network analysis [HLMJ09,GRU11,KRSY11,KNRS13,BBDS13] approximation algorithms [GLMRT10] singular value decomposition [HR12, HR13, KT13, DTTZ14] streaming algorithms [DNRY10,DNPR10,MMNW11] mechanism design [MT07,NST10,X11,NOS12,CCKMV12,HK12,KPRU12] … See Simons Institute Workshop on Big Data & Differential Privacy 12/13

Amazing Possibility I: Synthetic Data ≥𝟒𝟎 Smoke? Cancer? 1 ≥𝟒𝟎 Smoke? Cancer? 1 C 𝑛 “fake” people 𝑑 Theorem [BLR08,HR10]: If 𝑛≫𝑑, can generate diff. private synthetic data preserving exponentially many statistical properties of dataset (e.g. fraction of people w/each set of attributes). Computational complexity is a challenge [DNRRV09,UV11,U13] Practical implementations in [HLM12,GGHRW14]

Amazing Possibility I: Synthetic Data ≥𝟒𝟎 Smoke? Cancer? 1 ≥𝟒𝟎 Smoke? Cancer? 1 C “fake” people Q: How could this be possible? Would be easy if we could compromise privacy of “just a few” people. A few random rows preserves many statistical properties. Differential privacy doesn’t allow this.

Amazing Possibility I: Synthetic Data ≥𝟒𝟎 Smoke? Cancer? 1 ≥𝟒𝟎 Smoke? Cancer? 1 C “fake” people Q: How could this be possible? Construct a “smooth” distribution on synthetic datasets (via [MT07]) Put higher probability on synthetic datasets that agree more with real dataset on statistics of interest. Ensure (Probability of each inaccurate synthetic dataset) ×(# of synthetic datasets) is very small.

Amazing Possibility II: Statistical Inference & Machine Learning ≥𝟒𝟎 Smoke? Cancer? 1 Hypothesis ℎ or model 𝜃 about world, e.g. rule for predicting disease from attributes C 𝑛 𝑑 Theorem [KLNRS08,S11]: Differential privacy for vast array of machine learning and statistical estimation problems with little loss in convergence rate as 𝑛→∞. Optimizations & practical implementations for logistic regression, ERM, LASSO, SVMs in [RBHT09,CMS11,ST13,JT14].

Challenges for DP in Practice Accuracy for “small data” (moderate values of 𝑛) Modelling & managing privacy loss over time Especially over many different analysts & datasets Analysts used to working with raw data One approach: “Tiered access” – DP for wide access, raw data only by approval with strict terms of use (cf. Census PUMS vs. RDCs) Cases where privacy concerns are not “local” (e.g. privacy for large groups) or utility is not “global” (e.g. targeting) Matching guarantees with privacy law & regulation …

Some Efforts to Bring DP to Practice CMU-Cornell-PennState “Integrating Statistical and Computational Approaches to Privacy” See http://onthemap.ces.census.gov/ UCSD “Integrating Data for Analysis, Anonymization, and Sharing” (iDash) UT Austin “Airavat: Security & Privacy for MapReduce” UPenn “Putting Differential Privacy to Work” Stanford-Berkeley-Microsoft “Towards Practicing Privacy” Duke-NISSS “Triangle Census Research Network” Harvard “Privacy Tools for Sharing Research Data” …

A project at Harvard: “Privacy Tools for Sharing Research Data” http://privacytools.seas.harvard.edu/ Computer Science, Social Science, Law, Statistics Goal: to develop technological, legal, and policy tools for sharing of personal data for research in social science and other fields. Supported by an NSF Secure & Trustworthy Cyberspace “Frontier” grant and seed funding from Google.

Goal: use differential privacy to widen access

through the Dataverse interface, without downloading data. For non-restricted datasets, can run many statistical analyses (“Zelig methods”) through the Dataverse interface, without downloading data.

Conclusions Differential Privacy offers Strong, scalable privacy guarantees Compatibility with many types of “big data” analyses Amazing possibilities for what can be achieved in principle There are some challenges, but reasons for optimism Intensive research effort from many communities Some successful uses in practice already Differential privacy easier as 𝑛→∞