The Sparse Vector Technique CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 12 : 590.03 Fall 12.

Slides:



Advertisements
Similar presentations
Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Statistical Techniques I EXST7005 Sample Size Calculation.
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 8: Estimating with Confidence
Chapter 10: Estimating with Confidence
Sampling Distributions
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
1 MF-852 Financial Econometrics Lecture 4 Probability Distributions and Intro. to Hypothesis Tests Roy J. Epstein Fall 2003.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Evaluating Hypotheses
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Visual Recognition Tutorial
Inferences About Process Quality
CHAPTER 8 Estimating with Confidence
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Preserving Privacy in Clickstreams Isabelle Stanton.
Chapter 10: Estimating with Confidence
Determining the Size of
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
June 18, 2008Stat Lecture 11 - Confidence Intervals 1 Introduction to Inference Sampling Distributions, Confidence Intervals and Hypothesis Testing.
Lecture 3-3 Summarizing r relationships among variables © 1.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Sampling Distributions.
CHAPTER 8 Estimating with Confidence
CS573 Data Privacy and Security Statistical Databases
AP Statistics: Section 8.2 Geometric Probability.
Statistics for Data Miners: Part I (continued) S.T. Balke.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Privacy by Learning the Database Moritz Hardt DIMACS, October 24, 2012.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 2.4 Describing Distributions Numerically – cont. Describing Symmetric Data.
Implementing Differential Privacy & Side-channel attacks CompSci Instructor: Ashwin Machanavajjhala 1Lecture 14 : Fall 12.
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
Other Perturbation Techniques. Outline  Randomized Responses  Sketch  Project ideas.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
Section 3.1 Introduction & Review of Power Series.
ANOVA, Regression and Multiple Regression March
Lecture 5 Introduction to Sampling Distributions.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
Chapter 6 Sampling and Sampling Distributions
Chapter 9: Means and Proportions as Random Variables 9.1 Understanding dissimilarity among samples 9.2 Sampling distributions for sample proportions 9.3.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
Compressive Coded Aperture Video Reconstruction
Private Data Management with Verification
Gentle Measurement of Quantum States and Differential Privacy *
Differential Privacy.
Presentation transcript:

The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12

Announcement Project proposal submission deadline is Fri, Oct 12 noon. How to write the proposal? – Just like any paper … – … Abstract, Introduction, Notation, Problem Statement, Related Work – Instead of algorithms and results sections, you will have section describing how you will solve the problem. Lecture 12 : Fall 122

Recap: Laplace Mechanism Thm: If sensitivity of the query is S, then adding Laplace noise with parameter λ guarantees ε-differential privacy, when λ = S/ε Sensitivity: Smallest number s.t. for any d, d’ differing in one entry, || q(d) – q(d’) || ≤ S(q) Histogram query: Sensitivity = 2 Variance / error on each entry = 2λ 2 = 2x4/ε 2 3Lecture 12 : Fall 12

Cohort Size Estimation Problem Lecture 12 : Fall 124 Population of medical patients Are there at least 200 individuals who are male cancer survivors, between 20-30, who were admitted for surgery Are there at least 200 male cancer survivors who are between ages of 20 and 30 Are there at least 200 individuals who are male cancer survivors and admitted for surgery

Cohort Size Estimation Problem A set of queries {Q1, Q2, Q3, …, Qn} Each query Qi : Number of tuples satisfying property pi > τ ? – If answer is yes, return the number of tuples satisfying that property And, Researcher performs additional analysis – If answer is no, then return NULL. Sensitivity of each Qi = ? How do we answer using differential privacy? Lecture 12 : Fall 125 = 1

Cohort Size Estimation Problem Laplace mechanism: Sensitivity of all queries is: n For each query: qi’ = Qi(D) + Lap(n/ε) Return qi’ if qi’ > τ Return φ if qi’ < τ Lecture 12 : Fall 126

Accuracy We will say that an algorithm is (α, β)-accurate if for a sequence of queries Q1, Q2, …, Qn if with probability > 1-β, the following holds: |qi’ – Qi(D)| < α if qi’ ≠ φ Qi(D) < T + αif qi’ = φ Lecture 12 : Fall 127

Accuracy of Laplace Mechanism Lecture 12 : Fall 128

Cohort Estimation Problem In many exploratory situations, only a small number c of the queries actually have a count > τ However, accuracy depends on the total number of queries, not just the queries that cross the threshold, – Even though we do not return an answer otherwise. Is there a mechanism where you need to pay when the count is > τ ? Lecture 12 : Fall 129

Sparse Vector Technique Set count = 0 Set τ’ = τ + Lap(2/ε) For each query: qi’ = Qi(D) + Lap(2c/ε) If qi’ ≥ τ’ & count < c, count++ Return qi’ Else if qi’ < τ’ Return φ Else // count ≥ c Abort Lecture 12 : Fall 1210 Instead of Lap(n/ε) Answer at most c queries positively Use a noisy threshold

Sparse Vector Technique: Privacy Lecture 12 : Fall 1211 Previous answers (current answer is not independent of previous answers)

Sparse Vector Technique: Privacy Lecture 12 : Fall 1212 At most c queries answered positively Independent of the number of queries answered with NULL

Sparse Vector Technique: Privacy Let A Z (D) be the set of noise values { vi = qi’ – Qi(D) } that result in the observed φ answers when τ’ = Z. If we changed D to D’, If Qi(D) + vi < Z, then Qi(D’) + vi ≤ Qi(D) vi ≤ Z+1 If Qi(D’) + vi < Z-1, then Qi(D) + vi ≤ Qi(D’) vi ≤ Z Lecture 12 : Fall 1213

Sparse Vector Technique: Privacy Let A Z (D) be the set of noise values { vi = qi’ – Qi(D) } that result in the observed φ answers when τ’ = Z. If we changed D to D’, Also, from Laplace mechanism, Lecture 12 : Fall 1214

Sparse Vector Technique: Privacy Lecture 12 : Fall 1215

Sparse Vector Technique: Privacy Pay c ∙ ε 1 (=ε/2) privacy for the questions that have a count greater than the noisy threshold. You pay ε 2 (=ε/2) privacy for adding noise to the threshold. All the questions whose counts are lower than the threshold are answered for free! Lecture 12 : Fall 1216

Sparse Vector Technique: Accuracy Theorem: For any queries Q1, Q2, …, Qk such that |{ i : Qi(D) > τ - α}| ≤ c Then, the sparse vector technique: 1. does not abort, and 2. is (α,β)-accurate for Recall: Laplace mechanism is (α,β)-accurate for Lecture 12 : Fall 1217

Accuracy We will say that an algorithm is (α, β)-accurate if for a sequence of queries Q1, Q2, …, Qn if with probability > 1-β the algorithm does not abort and the following holds: |qi’ – Qi(D)| < α if qi’ ≠ φ, or qi’ ≥ τ’ Qi(D) < T + αif qi’ = φ, or qi’ < τ’ Lecture 12 : Fall 1218

Sparse Vector Technique: Accuracy Suppose When qi’ ≠ φ, When qi’ = φ, then qi’ < τ’ And, the algorithm always aborts: Lecture 12 : Fall 1219

Sparse Vector Technique: Accuracy Lecture 12 : Fall 1220

Summary of Sparse Vector Technique If you have many low sensitivity queries, and you only expect a few of the queries to be useful. Sparse vector techniques allows you to pay only for the positively answered queries. Much smaller error than the Laplace mechanism. Lecture 12 : Fall 1221

Next Class Multiplicative Weights Algorithms – General paradigm for algorithm design – Application to privately answering queries – Application to privately publishing a dataset Lecture 12 : Fall 1222