An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this.

Slides:



Advertisements
Similar presentations
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Advertisements

Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
1. Required Reading A firm foundation for private data analysis. Dwork, C. Communications of the ACM, 54(1), Privacy by the Numbers: A New.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
Privacy Enhancing Technologies
Lecturer: Moni Naor Joint work with Cynthia Dwork Foundations of Privacy Informal Lecture Impossibility of Disclosure Prevention or The Case for Differential.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
An brief tour of Differential Privacy Avrim Blum Computer Science Dept Your guide:
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Helping Kinsey Compute Cynthia Dwork Microsoft Research Cynthia Dwork Microsoft Research.
Privacy Preserving Market Basket Data Analysis Ling Guo, Songtao Guo, Xintao Wu University of North Carolina at Charlotte.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases Brian Thompson 1, Stuart Haber 2, William G. Horne 2, Tomas.
2. Attacks on Anonymized Social Networks. Setting A social network Edges may be private –E.g., “communication graph” The study of social structure by.
Calibrating Noise to Sensitivity in Private Data Analysis
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
CS555Spring 2012/Topic 41 Cryptography CS 555 Topic 4: Computational Approach to Cryptography.
Cramer-Shoup is Plaintext Aware in the Standard Model Alexander W. Dent Information Security Group Royal Holloway, University of London.
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Differential Privacy Tutorial Part 1: Motivating the Definition Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual.
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
IND-CPA and IND-CCA Concepts Summary  Basic Encryption Security Definition: IND-CPA  Strong Encryption Security Definition: IND-CCA  IND-CPA, IND-CCA.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
1 WP 10 On Risk Definitions and a Neighbourhood Regression Model for Sample Disclosure Risk Estimation Natalie Shlomo Hebrew University Southampton University.
Slide 1 Vitaly Shmatikov CS 380S Privacy-Preserving Data Mining.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
THE SCIENTIFIC METHOD: It’s the method you use to study a question scientifically.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
1 CIS 5371 Cryptography 1.Introduction. 2 Prerequisites for this course  Basic Mathematics, in particular Number Theory  Basic Probability Theory 
Slide 1 CS 380S Differential Privacy Vitaly Shmatikov most slides from Adam Smith (Penn State)
University of Texas at El Paso
Towards Human Computable Passwords
Privacy-preserving Release of Statistics: Differential Privacy
Modern symmetric-key Encryption
Human Computable Passwords
Semantic Security and Indistinguishability in the Quantum World
Differential Privacy in Practice
Lecture 4: CountSketch High Frequencies
Cryptography Lecture 5.
Cryptography Lecture 5.
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
Gentle Measurement of Quantum States and Differential Privacy *
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Differential Privacy.
Presentation transcript:

An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this presentation are from Cyntia Dwork’s talk.

Goal of Differential Privacy: Privacy-Preserving Statistical Analysis of Confidential Data Finding Statistical Correlations Analyzing medical data to learn genotype/phenotype associations Correlating cough outbreak with chemical plant malfunction Can’t be done with HIPAA safe-harbor sanitized data Noticing Events Detecting spike in ER admissions for asthma Official Statistics Census Contingency Table Release (statistics for small sets of attributes) Model fitting, regression analyses, etc. Standard Datamining Tasks Clustering; learning association rules, decision trees, separators; principal component analysis

Two Models: Database K ? Sanitized Database Non-Interactive: Data are sanitized and released

Two Models: Database K ? Sanitized Database Interactive: Multiple Queries, Adaptively Chosen

Achievements for Statistical Data-Bases Defined Differential Privacy Proved classical goal (Dalenius, 1977) unachievable “Ad Omnia” definition; independent of linkage information General Approach; Rigorous Proof Relates degree of distortion to the (mathematical) sensitivity of the computation needed for the analysis “How much” can the data of one person affect the outcome? Cottage Industry: redesigning algorithms to be insensitive Assorted Extensions When noise makes no sense; when actual sensitivity is much less than worst-case; when the database is distributed; … Lower bounds on distortion Initiated by Dinur and Nissim

Semantic Security for Confidentiality [Goldwasser-Micali ’ 82] Vocabulary Plaintext: the message to be transmitted Ciphertext: the encryption of the plaintext Auxiliary information: anything else known to attacker The ciphertext leaks no information about the plaintext. Formalization Compare the ability of someone seeing aux and ciphertext to guess (anything about) the plaintext, to the ability of someone seeing only aux to do the same thing. Difference should be “tiny”.

Semantic Security for Privacy? Dalenius, 1977 Anything that can be learned about a respondent from the statistical database can be learned without access to the database. Happily, Formalizes to Semantic Security Unhappily, Unachievable [Dwork and Naor 2006] Both for not serious and serious reasons. 7

Semantic Security for Privacy? The Serious Reason (told as a parable) Database teaches average heights of population subgroups “Terry Tao is four inches taller than avg Swedish ♀” Access to DB teaches Terry’s height. Terry’s height learnable from the DB, not learnable w/o. Proof extends to “any” notion of privacy breach. Attack Works Even if Terry Not in DB! Suggests new notion of privacy: risk incurred by joining DB Before/After interacting vs Risk when in/notin DB

Differential Privacy: Formal Definition K gives  differential  privacy if for all values of DB, DB’ differing in a single element, and all S in Range( K ) Pr[ K (DB) in S] Pr[ K (DB’) in S] ≤ e ε ~ (1+ε) ratio bounded Pr [t]

f + noise ? K f: DB  R d K (f, DB) = f(DB) + [Noise] d Eg, Count(P, DB) = # rows in DB with Property P 42 Achieving Differential Privacy:

How Much Can f(DB + Me) Exceed f(DB - Me)? Recall: K (f, DB) = f(DB) + noise Question Asks: What difference must noise obscure? Δf  = max H(DB, DB’)=1 ||f(DB) – f(DB’)|| 1 eg, ΔCount  = 1 43 Sensitivity of Function

Calibrate Noise to Sensitivity Δf  = max H(DB, DB’)=1 ||f(DB) – f(DB’)|| 1 -4R-3R-2R-R 0 R2R3R4R5R Lap(R) : f(x) = 1/2R exp(-|x|/R) Theorem: To achieve ε-differential  privacy, usescaled symmetric noise [Lap(R)] d with R = Δf  ε. Increasing R flattens curve; more privacy Noise depends on f and  not on the database

Why does it work? (d=1) Pr[ K (f, DB’) = t] Pr[ K (f, DB) = t] = exp(-(|t- f(DB)|-|t- f(DB’)|)/R) ≤ exp(-Δf/R) ≤ e  Δf  = max DB, DB-Me |f(DB) – f(DB-Me)| Theorem: To achieve ε-differential  privacy, usescaled symmetric noise Lap(R) with R = Δf  ε. -4R-3R-2R-R 0 R2R3R4R5R

Differential Privacy: Example IDClaim (‘000$) IDClaim (‘000$) ………… Query : Average claim Aux input : A fraction C of entries. Without Noise : Can learn my claim w/ prob. C. What privacy level is enough for me? I feel safe if I’m hidden among n people  Pr [identify] = 1/n If n=100, c=0.01 in this example  = 1. In most applications, same  gives me more feeling of privacy. e.g. adv. Identification itself is probabilistic.   = 0.1, 1.0, ln(2)

Differential Privacy: Example Query : Average claim Aux input : 12 entries  = 1  hide me among 1200 people Δf = max DB, DB-Me |f(DB) – f(DB-Me)| = max DB, DB-Me |Avg(DB) – Avg(DB-Me)| = = 0.81 Add noise : Lap (  f/  ) = Lap(0.81/1) = Lap(0.81) IDClaim (‘000$) IDClaim (‘000$) ………… Var = 1.31  Roughly: respond w/ ± 1.31 true value  Avg. not a sensitive function!

Differential Privacy: Example Query : max claim Aux input : 12 entries  = 1  hide me among 1200 people Δf = max DB, DB-Me |f(DB) – f(DB-Me)| = max DB, DB-Me |Avg(DB) – Avg(DB-Me)| = = 7.46 Add noise : Lap (  f/  ) = Lap(7.46/1) = Lap(7.46) IDClaim (‘000$) IDClaim (‘000$) ………… Var = 1.31  Roughly: respond w/ ± true value  Max. function is very sensitive!  Useless : error > sampling error

For all DB, DB’ differing in at most one element for all subset S of Range(K ), Pr[ K (DB) in S] ≤ e   Pr[ K (DB’) in S] + T where  (n) is negligible. Cf :  Differential  Privacy is unconditional, independent of n A Natural Relaxation: ( ,T)-Differential Privacy

Database D = {d 1, …,d n } rows in {0,1} k T “counting queries” q = (S,g) S subset [1..n], g: rows  {0,1}; “How many rows in S satisfy g?” q(D) = ∑ i in S g(di) Privacy Mechanism: Add Binomial Noise r = q(D) + B(f(n), ½) SuLQ Theorem: (δ, T )-Differential Privacy when T(n)=O(n c ), 0< c <1; δ = 1/O(n c’ ), 0< c’ < c/2 f(n) = (T(n) /δ 2 ) log 6 (n) If (√T(n)/δ)<O(√n) then |noise| < |sampling error|! Feasibility Results: [Dwork Nissim ’04] (Here: Revisionist Version) e.g. : T =n 3/4 and δ= 1/10 then √T/ δ= 10n 3/8 << √ n

Blatant Non-Privacy: Adversary guesses correctly o(n) rows Blatant non-privacy if: all /  *cn / (1/2 +  c’n answers are within o(√n) of the true answer, n / cn / c’n queries and poly(n) / poly(n) / exp(n) computation. [Y] / [DMT] / [DMT] Results are independent of how noise is distributed. A variant model permits poly(n) computation in the final case [DY]. Impossibility Results: Line of Research Initiated by Dinur and Nissim ’03 2n queries: noise E permits correctly guessing n-4E rows [DiNi]. Depressing implications for non-interactive case. even against an adversary restricted to

Thank You!