Download presentation

Presentation is loading. Please wait.

Published byAlonzo Walby Modified over 2 years ago

1
Restriction Access, Population Recovery & Partial Identification Avi Wigderson IAS, Princeton Joint with Zeev Dvir Anup Rao Amir Yehudayoff

2
Restriction Access, A new model of “Grey-box” access

3
Systems, Models, Observations From Input-Output (I 1,O 1 ), (I 2,O 2 ), (I 3,O 3 ), ….? Typically more!

4
Black-box access Successes & Limits Learning: PAC, membership, statistical…queries Decision trees, DNFs? Cryptography: semantic, CPA, CCA, … security Cold boot, microwave,… attacks? Optimization: Membership, separation,… oracles Strongly polynomial algorithms? Pseudorandomness: Hardness vs. Randomness Derandomizing specific algorithms? Complexity: 2 = NP NP What problems can we solve if P=NP?

5
The gray scale of access f: n m D: “device” computing f (from a family of devices) D x 1, f(x 1 ) x 2, f(x 2 ) x 3, f(x 3 ) …. D D How to model? Many specific ideas. Ours: general, clean Black Box Gray Box – natural starting point - natural intermediate pt Clear Box

6
Restriction Access (RA) f: n m D: “device” computing f Restriction: = (x,L), L [n], x n, L L live vars Observations: ( , D| ) D| (simplified after fixing) computes f| on L Black L = Gray Clear L = [n] (x,f(x)) ( , D| ) ( x,D) D f(x) x 1, x 2, *,* …. * ||

7
Example: Decision Tree x1x1 x4x4 x2x2 x3x3 x2x2 0 1 01 01 10 D = (x,L) L = {3,4} x = (1010) D| = x4x4 x3x3 0 10

8
Modeling choices (RA-PAC) Restriction: = (x,L), L [n], x n, unknown D Input x : friendly, adversarial, random Unknown distribution (as in PAC) Live vars L : friendly, adversarial, random - independent dist (as in random restrictions)

9
RA-PAC Results Probably, Approximately Correct (PAC) learning of D, from restrictions with each variable remains alive with prob Thm 1 [DRWY] : A poly(s, ) alg for RA-PAC learning size-s decision trees, for every >0 (reconstruction from pairs of live variables) Thm 2 [DRWY] : A poly(s, ) alg for RA-PAC learning size-s DNFs, for every >.365… (reduction to “Population Recovery Problem”) Positive - In contrast to PAC !!!

10
Population Recovery (learning a mixture of binomials)

11
Population Recovery Problem k species, n attributes, from , Vectors v 1, v 2, … v k n Distribution p 1, p 2, … p k , >0 Task: Recover all v i, p i (upto ) from samples p 1 1/2 0000 v 1 p 2 1/3 0110 v 2 p 3 1/6 1100 v 3 Red: Known Blue: Unknown n k

12
Population Recovery Problem k species, n attributes, from , , >0 v 1, v 2, … v k n p 1, p 2, … p k fraction in population Task: Recover all v i, p i (upto ) from samples Samplers: (1) u v i with prob. p i -Lossy Sampler: (2) u(j) ? with prob. 1- j [n] -Noisy Sampler: (2) u(j) flipped w.p. 1/2- j [n] 0110 ?1?0?1?0 11001100 p 1 1/2 0000 v 1 p 2 1/3 0110 v 2 p 3 1/6 1100 v 3

13
Skull Teeth Vertebrae Arms Ribs Legs Tail Loss – Paleontology Height (ft) Length (ft) Weight (lbs) True Data 26% 11% 13% 30% 20%

14
Skull Teeth Vertebrae Arms Ribs Legs Tail Loss – Paleontology Height (ft) Length (ft) Weight (lbs) From samples Dig #1 Dig #2 Dig #3 Dig #4 …… each finding common to many species! How do they do it?

15
Socialism Abortion Gay marriage Marijuana Male Rich North US Noise – Privacy True Data 2% 0 1 1 0 1 0 0 1% 1 1 0 0 0 1 1 …… From samples Joe 0 0 0 0 0 1 1 Jane 0 0 0 0 1 1 1 …. Who flipped every correct answer with probability 49% Deniability? Recovery?

16
PRP - applications Recovering from loss & noise - Clustering / Learning / Data mining - Computational biology / Archeology / …… - Error correction - Database privacy - …… Numerous related papers & books

17
PRP - Results Facts: =0 obliterates all information. - No polytime algorithm for = o(1) Thm 3 [DRWY] A poly(k, n, ) algorithm, from lossy samples, for every >.365… Thm 4 [WY]: A poly(k log k, n, ) algorithm, from lossy and/or noisy samples, for every > 0 Kearns, Mansour, Ron, Rubinfeld, Schapire, Sellie exp(k) algorithm for this discrete version Moitra, Valiant exp(k) algorithm for Gaussian version (even when noise is unknown)

18
Proof of Thm 4 Reconstruct v i, p i From samples,,,…. Lemma 1: Can assume we know the v i ’s ! Proof: Exposing one column at a time. Lemma 2: Easy in exp(n) time ! Proof: Lossy - enough samples without “?” Noisy – linear algebra on sample probabilities. Idea: Make n=O(log k) [Dimension Reduction] p 1 1/2 0000 v 1 p 2 1/3 0110 v 2 p 3 1/6 1100 v 3 ?1?00??01100 n k

19
Partial IDs a new dimension-reduction technique

20
Dimension Reduction and small IDs Lemma: Can approximate p i in exp(| S i |) time ! Does one always have small IDs? 1 2 3 4 5 6 7 8 p 1 0 0 0 0 0 1 0 1 v 1 p 2 0 1 1 0 1 0 1 0 v 2 p 3 0 1 0 0 1 0 1 1 v 3 p 4 1 1 1 0 1 0 1 1 v 4 p 5 1 1 0 0 0 1 1 1 v 5 p 6 1 1 0 0 1 0 0 1 v 6 p 7 0 1 0 0 0 1 1 1 v 7 p 8 1 1 0 1 1 0 1 1 v 8 p 9 1 1 0 0 0 1 1 1 v 9 IDs S 1 = {1,2} S 2 = {8} S 3 = {1,5,6} n = 8 k = 9 u – random sample q i = Pr[u[S i ]=v i [S i ]]

21
Small IDs ? NO! However,… 1 2 3 4 5 6 7 8 p 1 1 0 0 0 0 0 0 0 v 1 p 2 0 1 0 0 0 0 0 0 v 2 p 3 0 0 1 0 0 0 0 0 v 3 p 4 0 0 0 1 0 0 0 0 v 4 p 5 0 0 0 0 1 0 0 0 v 5 p 6 0 0 0 0 0 1 0 0 v 6 p 7 0 0 0 0 0 0 1 0 v 7 p 8 0 0 0 0 0 0 0 1 v 8 p 9 0 0 0 0 0 0 0 0 v 9 IDs S 1 = {1} S 2 = {2} S 3 = {3} … S 8 = {8} S 9 = {1,2,…,8} n = 8 k = 9

22
Linear algebra & Partial IDs However, we can compute p 9 = 1- p 1 - p 2 -…- p 8 1 2 3 4 5 6 7 8 p 1 1 0 0 0 0 0 0 0 v 1 p 2 0 1 0 0 0 0 0 0 v 2 p 3 0 0 1 0 0 0 0 0 v 3 p 4 0 0 0 1 0 0 0 0 v 4 p 5 0 0 0 0 1 0 0 0 v 5 p 6 0 0 0 0 0 1 0 0 v 6 p 7 0 0 0 0 0 0 1 0 v 7 p 8 0 0 0 0 0 0 0 1 v 8 p 9 0 0 0 0 0 0 0 0 v 9 IDs S 1 = {1} S 2 = {2} S 3 = {3} … S 8 = {8} S 9 = n = 8 k = 9 P

23
Back substitution and Imposters Can use back substitution if no cycles ! Are there always acyclic small partial IDs? 1 2 3 4 5 6 7 8 p 1 0 0 1 0 0 1 0 1 v 1 p 2 0 1 1 0 1 0 1 0 v 2 p 3 0 1 0 0 1 0 1 1 v 3 p 4 1 1 1 0 1 0 1 1 v 4 p 5 1 1 0 0 0 1 1 1 v 5 p 6 1 1 0 0 1 0 0 1 v 6 p 7 0 1 0 0 0 1 1 1 v 7 p 8 1 1 0 1 1 0 1 1 v 8 p 9 1 1 0 0 0 1 1 1 v 9 PIDs S 1 = {1,2} S 2 = {8} S 3 = {1,5,6} u – random sample q i = Pr[u[S i ]=v i [S i ]] q 1 = q 2 = q 3 = q 4 - p 1 - p 2 = S 4 = {3} any subset

24
Acyclic small partial IDs exist Lemma: There is always an ID of length log k 1 2 3 4 5 6 7 8 p 1 0 0 0 0 0 0 0 1 v 1 p 2 0 1 1 0 1 0 1 0 v 2 p 3 1 1 0 0 1 0 1 1 v 3 p 4 1 1 1 0 1 0 1 1 v 4 p 5 1 1 0 0 0 1 1 1 v 5 p 6 1 1 0 0 1 0 0 1 v 6 p 7 1 1 1 1 1 0 1 1 v 7 p 8 0 1 0 0 0 1 1 1 v 8 p 9 0 1 0 0 1 1 1 1 v 9 PIDs S 8 = {1,5,6} n = 8 k = 9 Idea: Remove and iterate to find more PIDs Lemma: Acyclic (log k)-PIDs always exists!

25
Chains of small Partial IDs Compute: q i = Pr[u i = 1] = Σ j≤i p i from sample u Back substitution : p i = q i - Σ j*
{
"@context": "http://schema.org",
"@type": "ImageObject",
"contentUrl": "http://images.slideplayer.com/11/3188837/slides/slide_25.jpg",
"name": "Chains of small Partial IDs Compute: q i = Pr[u i = 1] = Σ j≤i p i from sample u Back substitution : p i = q i - Σ j
*

26
The PID (imposter) graph Given: V=(v 1, v 2, … v k ) n S=(S 1, S 2,…,S k ) [ n ] n Construct G (V;S) by connecting v j v i iff v i is an imposter of v j : v i [ S j ] = v j [ S j ] 1 2 3 4 5 6 7 8 1 1 1 1 1 1 1 1 v 1 0 1 1 1 1 1 1 1 v 2 0 0 1 1 1 1 1 1 v 3 0 0 0 1 1 1 1 1 v 4 0 0 0 0 1 1 1 1 v 5 PIDs S 1 = {1} S 2 = {2} S 3 = {3} … S 5 = {5} width = max i |S i | depth = depth ( G ) Want: PIDs w/small width and depth for all V v i v j iff i > j

27
Constructing cheap PID graphs Theorem: For every V=(v 1, v 2, … v k ), v i n we can efficiently find PIDs S=(S 1,S 2,…,S k ), S i [ n ] of width and depth at most log k Algorithm: Initialize S i = for all i Invariant: |imposters( v i ;S i )| ≤ k /2 | Si | Repeat: (1) Make S i maximal if not, add minority coordinates to S i (2) Make chains monotone: v j v i then | S j |<| S i | (so G acyclic) if not, set S i to S j ( and apply (1) to S i ) 1 2 3 4 0 0 1 0 v 1 0 0 0 0 v 2 0 0 0 1 v 3 1 0 0 1 v 4 1 1 1 0 v 5 1 0 1 0 v 6

28
Analysis of the algorithm Theorem: For every V=(v 1, v 2, … v k ) n we can efficiently find PIDs S=(S 1,S 2,…,S k ) [ n ] n of width and depth at most log k Algorithm: Initialize S i = for all i Invariant: |imposters( v i ;S i )| ≤ k /2 | Si | Repeat: (1) Make S i maximal (2) Make chains monotone (v j v i then | S j |<| S i |) Analysis: - | S i | log k throughout for all i - i | S i | increases each step - Termination in k log k steps. - width log k and so depth log k

29
Conclusions - Restriction access: a new, general model of “gray box” access (largely unexplored!) - A general problem of population recovery - Efficient reconstruction from loss & noise - Partial IDs, a new dimension reduction technique for databases. Open: polynomial time algorithm in k ? (currently k log k, PIDs can’t beat k loglog k ) Open: Handle unknown errors ?

Similar presentations

OK

Property Testing of Data Dimensionality Robert Krauthgamer ICSI and UC Berkeley Joint work with Ori Sasson (Hebrew U.)

Property Testing of Data Dimensionality Robert Krauthgamer ICSI and UC Berkeley Joint work with Ori Sasson (Hebrew U.)

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on collection framework in java Download ppt on biomass energy Ppt on leadership Ppt on political parties and electoral process Ppt on responsive web design Ppt on obesity diet camp Ppt on credit default swaps charts Ppt on leadership and change management Ppt on carl friedrich gauss formula A ppt on loch ness monster facts