Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Complexity of Pebbling Graphs and Spam Fighting Moni Naor WEIZMANN INSTITUTE OF SCIENCE.

Similar presentations


Presentation on theme: "The Complexity of Pebbling Graphs and Spam Fighting Moni Naor WEIZMANN INSTITUTE OF SCIENCE."— Presentation transcript:

1 The Complexity of Pebbling Graphs and Spam Fighting Moni Naor WEIZMANN INSTITUTE OF SCIENCE

2 Based on:  Cynthia Dwork, Andrew Goldberg, N: On Memory-Bound Functions for Fighting Spam.  Cynthia Dwork, N, Hoeteck Wee: Pebbling and Proofs of Work

3 Principal techniques for spam-fighting 1. FILTERING  text-based, trainable filters … 2. MAKING SENDER PAY  computation [Dwork Naor 92, Back 97, Abadi Burrows Manasse Wobber 03, DGN 03, DNW05]  human attention [Naor 96, Captcha]  micropayments NOTE techniques are complementary: reinforce each other!

4 Principal techniques for spam-fighting 1. FILTERING  text-based, trainable filters … 2. MAKING SENDER PAY  computation [Dwork Naor 92, Back 97, Abadi Burrows Manasse Wobber 03, DGN 03, DNW 05]  human attention [Naor 96, Captcha]  micropayments NOTE techniques are complementary: reinforce each other!

5 Talk Plan  The proofs of work approach  DGN’s Memory bound functions  Generating a large random looking table [DNW]  Open problems: moderately hard functions

6 Pricing via processing [Dwork-Naor Crypto 92]  automated for the user  non-interactive, single-pass  no need for third party or payment infrastructure IDEA If I don’t know you: prove you spent significant computational resources (say 10 secs CPU time), just for me, and just for this message message m, time d + proof = f(m,S,R,d) sender Srecipient R easy to verifymoderately hard to compute

7 Choosing the function f Message m, Sender S, Receiver R and Date and time d  Hard to compute ; f(m,S,R,d) - cannot be amortized lots of work for the sender Should have good understanding of best methods for computing f  Easy to check ”z = f(m,S,R,d)” - little work for receiver  Parameterized to scale with Moore's Law easy to exponentially increase computational cost, while barely increasing checking cost Example: computing a square root mod a prime vs. verifying it; x 2 =y mod P

8 Which computational resource(s)? WANT corresponds to the same computation time across machines  computing cycles  high variance of CPU speeds within desktops  factors of 10-30  memory-bound approach [Abadi Burrows Manasse Wobber 03]  low variance in memory lantencies  factors of 1-4 GOAL design a memory-bound proof of effort function which requires a large number of cache misses

9 memory-bound model MAIN MEMORY large but slow CACHE small but fast USER MAIN MEMORY may be very very large may exploit locality CACHE cache size at most ½ user’s main memory SPAMMER

10 memory-bound model MAIN MEMORY large but slow CACHE small but fast USER MAIN MEMORY may be very very large CACHE cache size at most ½ user’s main memory SPAMMER 1. charge accesses to main memory  must avoid exploitation of locality 2. computation is free  except for hash function calls  watch out for low-space crypto attacks

11 Talk Plan  The proofs of work approach  DGN’s Memory bound functions  Generating a large random looking table [DNW]  Open problems: moderately hard functions

12 Path-following approach [DGN Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) PARAMETERS integer L, effort parameter e IDEA path is a sequence of L sequential accesses to T  sender searches collection of paths to find a good path  collection depends on (m, S, R, d)  density of good paths = 1/2 e  locations in T depends on hash functions H 0,…,H 3 table T

13 Path-following approach [DGN Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) PARAMETERS integer L, effort parameter e IDEA path is a sequence of L sequential accesses to T  sender searches collection of paths to find a good path OUTPUT (m, S, R, d) + description of a good path COMPLEXITY sending: O(2 e L) memory accesses; verifying: O(L) accesses table T

14 L P Collection P of paths. Depends on ( m,S,R,d ) Successful Path

15 Abstracted Algorithm Sender and Receiver share large random Table T. To send message m, Sender S, Receiver R date/time d, Repeat trial for k = 1,2, … until success : Current state specified by A auxiliary table Thread defined by (m,S,R,d,k)  Initialization: A = H 0 (m,S,R,d,k)  Main Loop: Walk for L steps (L=path length): c = H 1 (A) A = H 2 (A,T[c])  Success: if last e bit of H 3 (A) = 00…0 Attach to (m,S,R,d) the successful trial number k and H 3 (A) Verification: straightforward given (m, S, R, d, k, H 3 (A))

16 Animated Algorithm – a Single Step in the Loop C = H 1 (A) A C A = H 2 (A,T[C]) T H1H1 H2H2 T[C]

17 Full Specification E = (expected) factor by which computation cost exceeds verification = expected number of trials = 2 e If H 3 behaves as a random function L = length of walk Want, say, ELt = 10 seconds, where t = memory latency = 0.2  sec Reasonable choices: E = 24,000, L = 2048 Also need: How large is A ? A should not be very small… 1. Initialize: A = H 0 (m,S,R,d,k) 2. Main Loop: Walk for L steps:  c  H 1 (A)  A  H 2 (A,T[c]) 3. Success if H 3 (A) = 0 log E 4. Trial repeated for k = 1,2, … 5. Proof = (m,S,R,d,k,H 3 (A)) abstract algorithm

18 Choosing the H ’ s A “theoretical” approach: idealized random functions  Provide a formal analysis showing that the amortized number of memory access is high A concrete approach inspired by RC4 stream cipher  Very Efficient: a few cycles per step  Don’t have time inside inner loop to compute complex function  A is not small – changes gradually  Experimental Results across different machines

19 Path-following approach [Dwork-Goldberg-Naor Crypto 03] [Remarks] 1. lower bound holds for spammer maximizing throughput across any collection of messages and recipients 2. model idealized hash functions using random oracles 3. relies on information-theoretic unpredictability of T [Theorem] fix any spammer: whose cache size is smaller than |T|/2 assuming T is truly random assuming H 0,…,H 3 are idealized hash functions the amortized number of memory accesses per successful message is  (2 e L).

20 Why Random Oracles? Random Oracles 101 Can measure progress:  know which oracle calls must be made  can see when they occur. First occurrence of each such call is a progress call: 1 2 3 1 3 2 3 4… 1. Initialize: A = H 0 (m,S,R,d,k) 2. Main Loop: Walk for L steps:  c  H 1 (A)  A  H 2 (A,T[c]) 3. Success if H 3 (A) = 0 log E 4. Trial repeated for k = 1,2, … 5. Proof = (m,S,R,d,k,H 3 (A)) abstract algorithm

21 Proof highlights Use of idealized hash function implies:  At any point in time A is incompressible  The average number of oracle calls per success is  (EL).  We can follow the progress of the algorithm Cast the problem as that of asymmetric communication complexity between memory and cache  Only the cache has access to the functions H 1 and H 2 Cache Memory

22 Talk Plan  The proofs of work approach  DGN’s Memory bound functions  Generating a large random looking table [DNW]  Open problems

23 Using a succinct table [DNW 05] GOAL use a table T with a succinct description  easy distribution of software (new users)  fast updates (over slow connections) PROBLEM lose information theoretic unpredictability  spammer can exploit succinct description to avoid memory accesses IDEA generate T using a memory-bound process  Use time-space trade-offs for pebbling  Studied extensively in 1970s User builds the table T once and for all

24 Pebbling a graph GIVEN a directed acyclic graph RULES:  inputs: a pebble can be placed on an input node at any time  a pebble can be placed on any non-input vertex if all immediate parent nodes have pebbles  pebbles may be removed at any time GOAL find a strategy to pebble all the outputs while using few pebbles and few moves INPUT OUTPUT

25 What do we know about pebbling  Any graph can be pebbled using O(N/log N) pebbles. [Valiant] There are graphs requiring  (N/log N) pebbles [PTC]  Any graph of depth d can be pebbled using O(d) pebbles  Constant degree Tight tradeoffs: some shallow graphs requires many (super poly) steps to pebble with a few pebbles [LT]  Some results about pebbling outputs hold even when possible to put the available pebbles in any initial configuration

26 INPUTOUTPUT 1. input node i labeled H 4 (i) 2. non-input node i labeled H 4 (i, labels of parent nodes) 3. entries of T = labels of output nodes L i = H 4 (i, L j, L k ) LjLj LkLk Succinctly generating T GIVEN a directed acyclic graph  constant in-degree OBSERVATION good pebbling strategy ) good spammer strategy

27 Converting spammer strategy to a pebbling EX POST FACTO PEBBLING computed by offline inspection of spammer strategy 1.PLACING A PEBBLE place a pebble on node i if  H 4 used to compute L i = H 4 (i, L j, L k ), and  L j, L k are the correct labels 2.INITIAL PEBBLES place initial pebble on node j if  H 4 applied with L j as argument, and  L j not computed via H 4 3.REMOVING A PEBBLE remove a pebble as soon as it’s not needed anymore IDEA limit # of pebbles used by the spammer as a function of its cache size and # of bits it brings from memory computing a label using hash function lower bound on # moves ) lower bound on # hash function calls using cache + memory fetches lower bound on # pebbles ) lower bound on # memory accesses

28 CONSTRUCTION dag D composed of D 1 & D 2  D 1 has the property that pebbling many outputs requires many pebbles  more than cache and pages brought from memory can supply  stack of superconcentrators [Lengauer Tarjan 82]  D 2 is a fault-tolerant layered graph  even if a constant fraction of each layer is deleted – can still embed a superconcentrator  stack of expanders [Alon Chung 88, Upfal 92] D1D1 D2D2 outputs of D inputs of D SUPERCONCENTRATOR is a dag N inputs, N outputs any k inputs and k outputs connected by vertex-disjoint paths Constructing the dag

29 Using the dag [idea] fix any execution: 1. let S = set of mid-level nodes pebbled 2. if S is large, use time- space trade-offs for D 1 3. if S is small, use fault- tolerant property of D 2 :  delete nodes whose labels are largely determined by S CONSTRUCTION dag D composed of D 1 & D 2  D 1 has the property that pebbling many outputs requires many pebbles  more than cache and pages brought from memory can supply  stack of superconcentrators [Lengauer Tarjan 82]  D 2 is a fault-tolerant layered graph  even if a constant fraction of each layer is deleted – can still embed a superconcentrator  stack of expanders [Alon Chung 88, Upfal 92]

30 The lower bound result [Remarks] 1. lower bound holds for spammer maximizing throughput across any collection of messages and recipients 2. model idealized hash functions using random oracles [Theorem] for the dag D, fix any spammer: whose cache size is smaller than |T|/2 assuming H 0,…,H 4 are idealized hash functions makes poly # of hash function calls the amortized number of memory accesses per successful message is  (2 e L).

31 What can we conclude from the lower bound?  Shows that the design principles are sound  Gives us a plausibility argument  Tells us that if something will go wrong we will know where to look But  Based on idealized random functions  How to implement them  Might be computationally expensive  Are applied to all of A  Might be computationally expensive simply to “touch” all of

32 Talk Plan  The proofs of work approach  DGN’s Memory bound functions  Generating a large random looking table [DNW]  Open problems: moderately hard functions

33 Alternative construction based on sorting  motivated by time-space trade-offs for sorting [Borodin Cook 82]  easier to implement SORT … T[i] = H 4 (i, 1) T[i] = H 4 (i, T[i], 2) 1. input node i labeled H 4 (i, 1) 2. at each round, sort array 3. then apply H 4 to current values of the array OPEN PROBLEM prove a lower bound

34 More open problems WEAKER ASSUMPTIONS? no recourse to random oracles  use lower bounds for cell probe model and branching programs?  Unlike most of cryptography – in this case there is a chance of coming up with an unconditional result Physical limitations of computation to form a reasonable lower bound on the spammers effort

35 A theory of moderately hard function?  Key idea in cryptography: use the computational infeasibility of problems in order to obtain security.  For many applications moderate hardness is needed  current applications:  abuse prevention, fairness, few round zero-knowledge FURTHER WORK develop a theory of moderate hard functions

36 Open problems: moderately hard functions  Unifying Assumption  In the intractable world: one-way function necessary and sufficient for many tasks  Is there a similar primitive when moderate hardness is needed?  Precise model  Details of the computational model may matter, unifying it?  Hardness Amplification  Start with a somewhat hard problem and turn it into one that is harder.  Hardness vs. Randomness  Can we turn moderate hardness into and moderate pseudorandomness?  Following standard transformation is not necessarily applicable here  Evidence for non-amortization  It possible to demonstrate that if a certain problem is not resilient to amortization, then a single instance can be solved much more quickly?

37 Open problems: moderately hard functions  Immunity to Parallel Attacks  Important for timed-commitments  For the power function was used, is there a good argument to show immunity against parallel attacks?  Is it possible to reduce worst-case to average case:  find a random self reduction.  In the intractable world it is known that there are limitations on random self reductions from NP-Complete problems  Is it possible to randomly reduce a P-Complete problem to itself?  is it possible to use linear programming or lattice basis reduction for such purposes?  New Candidates for Moderately Hard Functions

38 Thank you Merci beaucoup תודה רבה

39 path-following approach [Dwork-Goldberg-Naor Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) PARAMETERS integer L, effort parameter e IDEA path is a sequence of L sequential accesses to T  sender searches collection of paths to find a good path  collection depends on (m, S, R, d)  locations in T depends on hash functions H 0,…,H 3  density of good paths = 1/2 e OUTPUT (m, S, R, d) + description of a good path COMPLEXITY sending: O(2 e L) memory accesses; verifying: O(L) accesses

40 path-following approach [Dwork-Goldberg-Naor Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) INPUT message m, sender S, receiver R, date/time d PARAMETERS integer L, effort parameter e IDEA sender searches paths of length L for a “good” path  path determined by table T and hash functions H 0,…,H 3  any path is good with probability 1/2 e OUTPUT (m, S, R, d) + description of a good path COMPLEXITY sender: O(2 e L) memory fetches; verification: O(L) fetches MAIN RESULT  (2 e L) memory fetches necessary

41 memory-bound model MAIN MEMORY large but slow locality CACHE small but fast hits/misses USER MAIN MEMORY may be very very large CACHE cache size at most ½ user’s main memory SPAMMER

42 path-following approach [Dwork-Goldberg-Naor Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) INPUT message m, sender S, receiver R, date/time d sender makes a sequence of random memory accesses into T inherently sequential (hence path-following)  sends a proof of having done so to the receiver  verification requires only a small number of accesses  memory access pattern leads to many cache misses

43 path-following approach [Dwork-Goldberg-Naor Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) INPUT message m, sender S, receiver R, date/time d OUTPUT attach to (m, S, R, d) the successful trial number k and H 3 (A) COMPLEXITY sender:  (2 e L) memory fetches; verification: O(L) fetches 1. Repeat for k = 1, 2, … 2. Initialize: A = H 0 (m,S,R,d,k) 3. Main Loop: Walk for L steps:  c  H 1 (A)  A  H 2 (A,T[c]) 4. Success: last e bits of H 3 (A) are 0’s SPAMMER needs 2 e walks each walk requires L/2 fetches

44 using the dag [idea] fix any execution: if many mid-level nodes are pebbled, use D 1 otherwise, use fault- tolerant property of D 2 CONSTRUCTION dag D composed of D 1 & D 2  D 1 has the property that pebbling many outputs requires many pebbles  more than cache and pages brought from memory can supply  stack of superconcentrators [Lengauer Tarjan 82]  D 2 is a fault-tolerant layered graph  even if a constant fraction of each layer is deleted – can still embed a superconcentrator  stack of expanders [Alon Chung 88, Upfal 92]


Download ppt "The Complexity of Pebbling Graphs and Spam Fighting Moni Naor WEIZMANN INSTITUTE OF SCIENCE."

Similar presentations


Ads by Google