Foundations of Privacy Lecture 6 Lecturer: Moni Naor.

Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Recap of last week’s lecture Counting Queries –The BLR Algorithm –Efficient Algorithm –Hardness Results

query 1, query 2,... Synthetic DB: Output is a DB Database answer 1 answer 3 answer 2 ? Sanitizer Synthetic DB: output also a DB (of entries from same universe X ), user reconstructs answers by evaluating query on output DB Software and people compatible Consistent answers

Counting Queries Queries with low sensitivity Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: error anyway inherent in statistical analysis Assume all queries given in advance U Database D of size n Query c Non-interactive

And Now… Bad News Runtime cannot be subpoly in |C| or |U| Output is synthetic DB (as in positive result) General output Exponential Mechanism cannot be implemented Want hardness… Got Crypto?

The Bad News For large C and U can’t get efficient sanitizers! Output is synthetic DB (as in positive result) General output Exponential Mechanism cannot be implemented Want hardness… Got Crypto?

Showing (Cryptographic) Hardness Have to come with universe U and concept class C A distribution on –databases –Concepts that is hard to sanitize The distribution may use cryptographic primitives

Digital Signatures Digital Signatures ( sk, vk ) Can build from one-way function [NaYu,Ro] m1m1 sig( m 1 ) m2m2 sig( m 2 ) mnmn sig( m n ) m’ sig( m’ ) valid signatures under vk Hard to forge new signature

Signatures ! No Synthetic DB Universe: ( m, s ) msg,sig pair Queries: c vk ( m, s ) output 1 iff s valid sig of m under vk m1m1 sig( m 1 ) m2m2 sig( m 2 ) mnmn sig( m n ) sanitizer m’ 1 s1s1 m’ k sksk most are valid signatures under vk inputs appear in output, no privacy! valid signatures under same vk

Can We output Synthetic DB Efficiently? |C| |U| subpolypoly subpoly poly ?? ?

Where is the Hardness Coming From? Signature example: Hard to satisfy a given query Easy to maintain utility for all queries but one More natural: Easy to satisfy each individual query Hard to maintain utility for most queries

Hardness on Average Universe: ( vk, m, s ) key,msg,sig Queries: c i ( vk, m, s ) - i -th bit of ECC(vk) c v ( vk, m, s ) - 1 iff valid sig under vk sanitizer valid signatures under vk m’ 1 s1s1 vk’ 1 m1m1 sig( m 1 ) vkm2m2 sig( m 2 ) vk mnmn sig( m n ) vkm’ k sksk vk’ k are these keys related to vk ? Yes! At least one is vk ! Error correcting code

Hardness on Average Samples: ( vk, m, s ) key,msg,sig Queries: c i ( vk, m, s ) - i -th bit of ECC(vk) c v ( vk, m, s ) - 1 iff valid sig under vk m’ 1 s1s1 m’ k sksk vk’ 1 vk’ k  8 i 3/4 of vk’ j agree w. ECC(vk)[i]  9 vk’ j s.t. ECC(vk’ j ), ECC(vk) are 3/4- close vk’ j = vk ( error-correcting code ) m’ j appears in input. No privacy! are these keys related to vk ? Yes! At least one is vk !

Where is Hardness Coming From? Signature example: Hard to satisfy a given query Easy to maintain utility for all queries but one More natural: Easy to satisfy each individual query Hard to maintain utility for most queries Ullman-Vadhan: even marginals on 2 variables hard

Can We output Synthetic DB Efficiently? |C| |U| subpolypoly subpoly poly ?? ? Signatures Hard on Avg. Using PRFs

Hardness with PRFs Let F={ f s |s seed} be a family of Pseudo-random functions. Length of seed = k Pseudo-random functions: a family of efficiently computable functions, such that –a random function from the family is indistinguishable ( via black-box access ) from truly random functions. f s : [ℓ]  [ℓ] Data Universe U = {(a, b) : a, b 2 [ℓ]}. Concepts = { c s |s seed}. c s ((a, b) ) = 1 iff f s (a)=b Polynomial size

The Hard-to-sanitize Distribution The distribution D on samples Generate a key s 2 {0, 1} k Generate n distinct elements a 1,..., a n 2 [ ℓ ]. The i -th entry in the database X is x i = (a i, f s (a i )). Claim : any differentially private sanitizer A cannot be better than 1/3 correct

The function f s is a pseudorandom function – with overwhelming probability over the choice of seed s, for any a 2 [ ℓ ] that does not appear in a 1,..., a n A sanitizer A cannot predict f s ( a) any better than it could a random function Expect : no more than a (1/ℓ + neg())- fraction of the a ’s in A(X) that are not in X to appear most frequently with the correct b. Suppose this event does not occur. Since all of the items in the input X satisfy the concept c s i.e. with probability noticeably greater than 1/ ℓ.

General output sanitizers Theorem Traitor tracing schemes exist if and only if sanitizing is hard Tight connection between |U|, |C| hard to sanitize and key, ciphertext sizes in traitor tracing Separation between efficient/non-efficient sanitizers uses [BoSaWa] scheme

Traitor Tracing: The Problem Center transmits a message to a large group Some Users leak their keys to pirates Pirates construct a clone: unauthorized decryption devices Given a Pirate Box want to find who leaked the keys E(Content) K 1 K 3 K 8 Content Pirate Box Traitors ``privacy” is violated!

Need semantic security! Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt, Decrypt and Trace. Setup : generates a key bk for the broadcaster and N subscriber keys k 1,..., k N. Encrypt : given a bit b generates ciphertext using the broadcaster’s key bk. Decrypt : takes a given ciphertext and using any of the subscriber keys retrieves the original bit Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1,...,N} of a key k i used to create the pirate box

Simple Example of Tracing Traitor Let E K (m) be a good shared key encryption sche Key generation: generate independent keys for E bk = k 1,..., k N Encrypt : for bit b generate independent ciphertexts E K 1 (b), E K 2 (b), … E K N (b) Decrypt : using k i : decrypt i th ciphertext Tracing algorithm: using hybrid argument Properties : ciphertext length N, key length 1.

Equivalence of TT and Hardness of Sanitizing Ciphertext Key Traitor Tracing Database entry Query Sanitizing hard TT PirateSanitizer for distribution of DBs (collection of)

Traitor Tracing ! Hard Sanitizing Theorem If exists TT scheme –cipher length c(n), –key length k(n), can construct: 1.Query set C of size ≈2 c(n) 2.Data universe U of size ≈2 k(n) 3.Distribution D on n -user databases w\ entries from U D is “ hard to sanitize ”: exists tracer that can extract an entry in D from any sanitizer’s output Separation between efficient/non-efficient sanitizers uses [BoSaWa06] scheme Violate its privacy!

Need semantic security! Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt, Decrypt and Trace. Setup : generates a key bk for the broadcaster and N subscriber keys k 1,..., k N. Encrypt : given a bit b generates ciphertext using the broadcaster’s key bk. Decrypt : takes a given ciphertext and using any of the subscriber keys retrieves the original bit Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1,...,N} of a key k i used to create the pirate box

Collusion Important parameter of a traitor-tracing scheme its collusion-resistance A scheme is t -resilient if tracing is guaranteed to work as long as no more than t keys were used to create the pirate decoder. When t = N scheme is said to be fully resilient. Other parameters ciphertext and private key lengths c(n) and k(n). One-time t-resilient TT scheme: semantic security is only guaranteed against adversaries given a single ciphertext Need it

Data universe : all possible keys U ={0,1} k(n). Concept class C : a concept for every possible ciphertext - for every m 2 {0,1} c(n) –The concept c m on input a key-string K outputs the decryption of m using the key K Hard-to-sanitize distribution : –Setup to generate n decryption keys for the users, database X.

Can view any sanitizer that maintains utility as – adversary that outputs an “ object ” that decrypts encryptions of 0 or 1 correctly. We can use the traitor-tracing algorithm on such a sanitizer to trace one of the keys in the input of the sanitizer.

From Hard to Sanitize to Tracing Traitors Given hard to sanitize distributions, can create a weak TT scheme: Ciphertext: generate database of individuals. Each key is a separate subset. Ciphertext corresponds to queries: knowing individuals allows approximating the query on the database Need coordination between the different part, since the approximations may differ.

Interactive Model Data Multiple queries, chosen adaptively ? query 1 query 2 Sanitizer

Counting Queries: answering queries interactively Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: error anyway inherent in statistical analysis Queries given one by one and should be answered. U Database D of size n Query c Interactive

Can we answer queries when not known in advance? Can always answer with independent noise –Limited to number of queries that is smaller than database size. We do not know the future but we do know the past! –Can answer based on past answers

Idea: Maintain list of Possible Databases DStart with D 0 = list of all databases of size m Each round j : D –if list D j-1 is representative: answer according to average database in list –Otherwise: prune the list to maintain consistency D D j-1 DDjDDj

Low sensitivity! DInitialize D 0 = { all databases of size m over U}. DEach round D j-1 = {x 1, x 2, …} where x i of size m For each query c 1, c 2, …, c k in turn: DLet A j Ã Average i 2 D j-1 min{d(x*,x i ), √n} DIf A j is small: answer according to median db in D j-1 –DD –D j Ã D j-1 DIf A j is large: remove all db’s that are far away to get D j –Give true answer Noisy threshold Plus noise

Need to show Accuracy and functionality: The result is accurate DIf A j is large: many of x i 2 D j-1 are removed D D j is never empty Privacy Not many large A j Can release large rounds Can release noisy answers.

Why can we release when large rounds occur? Do not expect more than O(m) large rounds Make the threshold noisy For every pair of neighboring databases: D and D’ Consider vector of thresholds If far away from threshold – can be the same in both If close to threshold: can correct at cost –Cannot occur too frequently

Why is there a good x i Queries with low sensitivity Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: error anyway inherent in statistical analysis U Database D of size n Query c Sample F of size m approximates D on all given c

m is Õ(n 2/3 log|C|) There exists x of size m =Õ((n\α) 2· log|C|) s.t. max cj dist(F good,D) ≤ α For α=Õ(n 2/3 log|C|),

Foundations of Privacy Lecture 6 Lecturer: Moni Naor.

Similar presentations

Presentation on theme: "Foundations of Privacy Lecture 6 Lecturer: Moni Naor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Foundations of Privacy Lecture 6 Lecturer: Moni Naor.

Similar presentations

Presentation on theme: "Foundations of Privacy Lecture 6 Lecturer: Moni Naor."— Presentation transcript:

Similar presentations

About project

Feedback