The Complexity of Pebbling Graphs and Spam Fighting Moni Naor WEIZMANN INSTITUTE OF SCIENCE.

Slides:

Advertisements

Similar presentations

Models of Computation Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms Week 1, Lecture 2.

Advertisements

Problems and Their Classes

Shortest Vector In A Lattice is NP-Hard to approximate

1 Maximizing Communication for Spam Fighting Oded Schwartz CS294, Lecture #19 Fall, 2011 Communication-Avoiding Algorithms

Foundations of Cryptography Lecture 2: One-way functions are essential for identification. Amplification: from weak to strong one-way function Lecturer:

Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.

COMP 553: Algorithmic Game Theory Fall 2014 Yang Cai Lecture 21.

Foundations of Cryptography Lecture 5 Lecturer: Moni Naor.

Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort

Fast FAST By Noga Alon, Daniel Lokshtanov And Saket Saurabh Presentation by Gil Einziger.

CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.

Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.

CMSC 414 Computer and Network Security Lecture 9 Jonathan Katz.

Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.

Analysis of Algorithms CS 477/677

CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?

Hash Tables1 Part E Hash Tables  

Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.

Chapter 11: Limitations of Algorithmic Power

Lecture 10: Search Structures and Hashing

1 Constructing Pseudo-Random Permutations with a Prescribed Structure Moni Naor Weizmann Institute Omer Reingold AT&T Research.

DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.

Advanced Algorithm Design and Analysis Student: Gertruda Grolinger Supervisor: Prof. Jeff Edmonds CSE 4080 Computer Science Project.

Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.

Provable Protocols for Unlinkability Ron Berman, Amos Fiat, Amnon Ta-Shma Tel Aviv University.

Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.

Spring 2015 Lecture 6: Hash Tables

Chapter 11 Limitations of Algorithm Power. Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples:

Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.

/425 Declarative Methods - J. Eisner1 Encodings and reducibility.

1 Program Correctness CIS 375 Bruce R. Maxim UM-Dearborn.

CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.

RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.

Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.

Logic Circuits Chapter 2. Overview  Many important functions computed with straight-line programs No loops nor branches Conveniently described with circuits.

CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.

An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.

1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.

Expander Graphs for Digital Stream Authentication and Robust Overlay Networks Presented by Neeraj Agrawal, Zifei Zhong.

ITEC 2620A Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: 2620a.htm Office: TEL 3049.

Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.

Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.

1 Chapter 34: NP-Completeness. 2 About this Tutorial What is NP ? How to check if a problem is in NP ? Cook-Levin Theorem Showing one of the most difficult.

NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.

Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.

COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.

Automata & Formal Languages, Feodor F. Dragan, Kent State University 1 CHAPTER 7 Time complexity Contents Measuring Complexity Big-O and small-o notation.

1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.

CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.

The Pebble Game Geri Grolinger York University. The Pebble Game Used for studying time-space trade-off Used for studying time-space trade-off One player.

Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From

Proofs of Space Stefan Dziembowski Symposium on the Work of Ivan Damgård April 1, 2016, Aarhus, Denmark Sebastian Faust Vladimir Kolmogorov Krzysztof Pietrzak.

Complexity Theory and Explicit Constructions of Ramsey Graphs Rahul Santhanam University of Edinburgh.

Advanced Algorithms Analysis and Design

Stochastic Streams: Sample Complexity vs. Space Complexity

New Characterizations in Turnstile Streams with Applications

HIERARCHY THEOREMS Hu Rui Prof. Takahashi laboratory

Digital Signature Schemes and the Random Oracle Model

Nikhil Bansal, Shashwat Garg, Jesper Nederlof, Nikhil Vyas

Pseudo-derandomizing learning and approximation

Objective of This Course

Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems

Encodings and reducibility

University of South Florida and Eindhoven University of Technology

Switching Lemmas and Proof Complexity

Cryptography Lecture 15.

Presentation transcript:

The Complexity of Pebbling Graphs and Spam Fighting Moni Naor WEIZMANN INSTITUTE OF SCIENCE

Based on:  Cynthia Dwork, Andrew Goldberg, N: On Memory-Bound Functions for Fighting Spam.  Cynthia Dwork, N, Hoeteck Wee: Pebbling and Proofs of Work

Principal techniques for spam-fighting 1. FILTERING  text-based, trainable filters … 2. MAKING SENDER PAY  computation [Dwork Naor 92, Back 97, Abadi Burrows Manasse Wobber 03, DGN 03, DNW05]  human attention [Naor 96, Captcha]  micropayments NOTE techniques are complementary: reinforce each other!

Principal techniques for spam-fighting 1. FILTERING  text-based, trainable filters … 2. MAKING SENDER PAY  computation [Dwork Naor 92, Back 97, Abadi Burrows Manasse Wobber 03, DGN 03, DNW 05]  human attention [Naor 96, Captcha]  micropayments NOTE techniques are complementary: reinforce each other!

Talk Plan  The proofs of work approach  DGN’s Memory bound functions  Generating a large random looking table [DNW]  Open problems: moderately hard functions

Pricing via processing [Dwork-Naor Crypto 92]  automated for the user  non-interactive, single-pass  no need for third party or payment infrastructure IDEA If I don’t know you: prove you spent significant computational resources (say 10 secs CPU time), just for me, and just for this message message m, time d + proof = f(m,S,R,d) sender Srecipient R easy to verifymoderately hard to compute

Choosing the function f Message m, Sender S, Receiver R and Date and time d  Hard to compute ; f(m,S,R,d) - cannot be amortized lots of work for the sender Should have good understanding of best methods for computing f  Easy to check ”z = f(m,S,R,d)” - little work for receiver  Parameterized to scale with Moore's Law easy to exponentially increase computational cost, while barely increasing checking cost Example: computing a square root mod a prime vs. verifying it; x 2 =y mod P

Which computational resource(s)? WANT corresponds to the same computation time across machines  computing cycles  high variance of CPU speeds within desktops  factors of  memory-bound approach [Abadi Burrows Manasse Wobber 03]  low variance in memory lantencies  factors of 1-4 GOAL design a memory-bound proof of effort function which requires a large number of cache misses

memory-bound model MAIN MEMORY large but slow CACHE small but fast USER MAIN MEMORY may be very very large may exploit locality CACHE cache size at most ½ user’s main memory SPAMMER

memory-bound model MAIN MEMORY large but slow CACHE small but fast USER MAIN MEMORY may be very very large CACHE cache size at most ½ user’s main memory SPAMMER 1. charge accesses to main memory  must avoid exploitation of locality 2. computation is free  except for hash function calls  watch out for low-space crypto attacks

Talk Plan  The proofs of work approach  DGN’s Memory bound functions  Generating a large random looking table [DNW]  Open problems: moderately hard functions

Path-following approach [DGN Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) PARAMETERS integer L, effort parameter e IDEA path is a sequence of L sequential accesses to T  sender searches collection of paths to find a good path  collection depends on (m, S, R, d)  density of good paths = 1/2 e  locations in T depends on hash functions H 0,…,H 3 table T

Path-following approach [DGN Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) PARAMETERS integer L, effort parameter e IDEA path is a sequence of L sequential accesses to T  sender searches collection of paths to find a good path OUTPUT (m, S, R, d) + description of a good path COMPLEXITY sending: O(2 e L) memory accesses; verifying: O(L) accesses table T

L P Collection P of paths. Depends on ( m,S,R,d ) Successful Path

Abstracted Algorithm Sender and Receiver share large random Table T. To send message m, Sender S, Receiver R date/time d, Repeat trial for k = 1,2, … until success : Current state specified by A auxiliary table Thread defined by (m,S,R,d,k)  Initialization: A = H 0 (m,S,R,d,k)  Main Loop: Walk for L steps (L=path length): c = H 1 (A) A = H 2 (A,T[c])  Success: if last e bit of H 3 (A) = 00…0 Attach to (m,S,R,d) the successful trial number k and H 3 (A) Verification: straightforward given (m, S, R, d, k, H 3 (A))

Animated Algorithm – a Single Step in the Loop C = H 1 (A) A C A = H 2 (A,T[C]) T H1H1 H2H2 T[C]

Full Specification E = (expected) factor by which computation cost exceeds verification = expected number of trials = 2 e If H 3 behaves as a random function L = length of walk Want, say, ELt = 10 seconds, where t = memory latency = 0.2  sec Reasonable choices: E = 24,000, L = 2048 Also need: How large is A ? A should not be very small… 1. Initialize: A = H 0 (m,S,R,d,k) 2. Main Loop: Walk for L steps:  c  H 1 (A)  A  H 2 (A,T[c]) 3. Success if H 3 (A) = 0 log E 4. Trial repeated for k = 1,2, … 5. Proof = (m,S,R,d,k,H 3 (A)) abstract algorithm

Choosing the H ’ s A “theoretical” approach: idealized random functions  Provide a formal analysis showing that the amortized number of memory access is high A concrete approach inspired by RC4 stream cipher  Very Efficient: a few cycles per step  Don’t have time inside inner loop to compute complex function  A is not small – changes gradually  Experimental Results across different machines

Path-following approach [Dwork-Goldberg-Naor Crypto 03] [Remarks] 1. lower bound holds for spammer maximizing throughput across any collection of messages and recipients 2. model idealized hash functions using random oracles 3. relies on information-theoretic unpredictability of T [Theorem] fix any spammer: whose cache size is smaller than |T|/2 assuming T is truly random assuming H 0,…,H 3 are idealized hash functions the amortized number of memory accesses per successful message is  (2 e L).

Why Random Oracles? Random Oracles 101 Can measure progress:  know which oracle calls must be made  can see when they occur. First occurrence of each such call is a progress call: … 1. Initialize: A = H 0 (m,S,R,d,k) 2. Main Loop: Walk for L steps:  c  H 1 (A)  A  H 2 (A,T[c]) 3. Success if H 3 (A) = 0 log E 4. Trial repeated for k = 1,2, … 5. Proof = (m,S,R,d,k,H 3 (A)) abstract algorithm

Proof highlights Use of idealized hash function implies:  At any point in time A is incompressible  The average number of oracle calls per success is  (EL).  We can follow the progress of the algorithm Cast the problem as that of asymmetric communication complexity between memory and cache  Only the cache has access to the functions H 1 and H 2 Cache Memory

Talk Plan  The proofs of work approach  DGN’s Memory bound functions  Generating a large random looking table [DNW]  Open problems

Using a succinct table [DNW 05] GOAL use a table T with a succinct description  easy distribution of software (new users)  fast updates (over slow connections) PROBLEM lose information theoretic unpredictability  spammer can exploit succinct description to avoid memory accesses IDEA generate T using a memory-bound process  Use time-space trade-offs for pebbling  Studied extensively in 1970s User builds the table T once and for all

Pebbling a graph GIVEN a directed acyclic graph RULES:  inputs: a pebble can be placed on an input node at any time  a pebble can be placed on any non-input vertex if all immediate parent nodes have pebbles  pebbles may be removed at any time GOAL find a strategy to pebble all the outputs while using few pebbles and few moves INPUT OUTPUT

What do we know about pebbling  Any graph can be pebbled using O(N/log N) pebbles. [Valiant] There are graphs requiring  (N/log N) pebbles [PTC]  Any graph of depth d can be pebbled using O(d) pebbles  Constant degree Tight tradeoffs: some shallow graphs requires many (super poly) steps to pebble with a few pebbles [LT]  Some results about pebbling outputs hold even when possible to put the available pebbles in any initial configuration

INPUTOUTPUT 1. input node i labeled H 4 (i) 2. non-input node i labeled H 4 (i, labels of parent nodes) 3. entries of T = labels of output nodes L i = H 4 (i, L j, L k ) LjLj LkLk Succinctly generating T GIVEN a directed acyclic graph  constant in-degree OBSERVATION good pebbling strategy ) good spammer strategy

Converting spammer strategy to a pebbling EX POST FACTO PEBBLING computed by offline inspection of spammer strategy 1.PLACING A PEBBLE place a pebble on node i if  H 4 used to compute L i = H 4 (i, L j, L k ), and  L j, L k are the correct labels 2.INITIAL PEBBLES place initial pebble on node j if  H 4 applied with L j as argument, and  L j not computed via H 4 3.REMOVING A PEBBLE remove a pebble as soon as it’s not needed anymore IDEA limit # of pebbles used by the spammer as a function of its cache size and # of bits it brings from memory computing a label using hash function lower bound on # moves ) lower bound on # hash function calls using cache + memory fetches lower bound on # pebbles ) lower bound on # memory accesses

CONSTRUCTION dag D composed of D 1 & D 2  D 1 has the property that pebbling many outputs requires many pebbles  more than cache and pages brought from memory can supply  stack of superconcentrators [Lengauer Tarjan 82]  D 2 is a fault-tolerant layered graph  even if a constant fraction of each layer is deleted – can still embed a superconcentrator  stack of expanders [Alon Chung 88, Upfal 92] D1D1 D2D2 outputs of D inputs of D SUPERCONCENTRATOR is a dag N inputs, N outputs any k inputs and k outputs connected by vertex-disjoint paths Constructing the dag

Using the dag [idea] fix any execution: 1. let S = set of mid-level nodes pebbled 2. if S is large, use time- space trade-offs for D 1 3. if S is small, use fault- tolerant property of D 2 :  delete nodes whose labels are largely determined by S CONSTRUCTION dag D composed of D 1 & D 2  D 1 has the property that pebbling many outputs requires many pebbles  more than cache and pages brought from memory can supply  stack of superconcentrators [Lengauer Tarjan 82]  D 2 is a fault-tolerant layered graph  even if a constant fraction of each layer is deleted – can still embed a superconcentrator  stack of expanders [Alon Chung 88, Upfal 92]

The lower bound result [Remarks] 1. lower bound holds for spammer maximizing throughput across any collection of messages and recipients 2. model idealized hash functions using random oracles [Theorem] for the dag D, fix any spammer: whose cache size is smaller than |T|/2 assuming H 0,…,H 4 are idealized hash functions makes poly # of hash function calls the amortized number of memory accesses per successful message is  (2 e L).

What can we conclude from the lower bound?  Shows that the design principles are sound  Gives us a plausibility argument  Tells us that if something will go wrong we will know where to look But  Based on idealized random functions  How to implement them  Might be computationally expensive  Are applied to all of A  Might be computationally expensive simply to “touch” all of

Talk Plan  The proofs of work approach  DGN’s Memory bound functions  Generating a large random looking table [DNW]  Open problems: moderately hard functions

Alternative construction based on sorting  motivated by time-space trade-offs for sorting [Borodin Cook 82]  easier to implement SORT … T[i] = H 4 (i, 1) T[i] = H 4 (i, T[i], 2) 1. input node i labeled H 4 (i, 1) 2. at each round, sort array 3. then apply H 4 to current values of the array OPEN PROBLEM prove a lower bound

More open problems WEAKER ASSUMPTIONS? no recourse to random oracles  use lower bounds for cell probe model and branching programs?  Unlike most of cryptography – in this case there is a chance of coming up with an unconditional result Physical limitations of computation to form a reasonable lower bound on the spammers effort

A theory of moderately hard function?  Key idea in cryptography: use the computational infeasibility of problems in order to obtain security.  For many applications moderate hardness is needed  current applications:  abuse prevention, fairness, few round zero-knowledge FURTHER WORK develop a theory of moderate hard functions

Open problems: moderately hard functions  Unifying Assumption  In the intractable world: one-way function necessary and sufficient for many tasks  Is there a similar primitive when moderate hardness is needed?  Precise model  Details of the computational model may matter, unifying it?  Hardness Amplification  Start with a somewhat hard problem and turn it into one that is harder.  Hardness vs. Randomness  Can we turn moderate hardness into and moderate pseudorandomness?  Following standard transformation is not necessarily applicable here  Evidence for non-amortization  It possible to demonstrate that if a certain problem is not resilient to amortization, then a single instance can be solved much more quickly?

Open problems: moderately hard functions  Immunity to Parallel Attacks  Important for timed-commitments  For the power function was used, is there a good argument to show immunity against parallel attacks?  Is it possible to reduce worst-case to average case:  find a random self reduction.  In the intractable world it is known that there are limitations on random self reductions from NP-Complete problems  Is it possible to randomly reduce a P-Complete problem to itself?  is it possible to use linear programming or lattice basis reduction for such purposes?  New Candidates for Moderately Hard Functions

Thank you Merci beaucoup תודה רבה

path-following approach [Dwork-Goldberg-Naor Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) PARAMETERS integer L, effort parameter e IDEA path is a sequence of L sequential accesses to T  sender searches collection of paths to find a good path  collection depends on (m, S, R, d)  locations in T depends on hash functions H 0,…,H 3  density of good paths = 1/2 e OUTPUT (m, S, R, d) + description of a good path COMPLEXITY sending: O(2 e L) memory accesses; verifying: O(L) accesses

path-following approach [Dwork-Goldberg-Naor Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) INPUT message m, sender S, receiver R, date/time d PARAMETERS integer L, effort parameter e IDEA sender searches paths of length L for a “good” path  path determined by table T and hash functions H 0,…,H 3  any path is good with probability 1/2 e OUTPUT (m, S, R, d) + description of a good path COMPLEXITY sender: O(2 e L) memory fetches; verification: O(L) fetches MAIN RESULT  (2 e L) memory fetches necessary

memory-bound model MAIN MEMORY large but slow locality CACHE small but fast hits/misses USER MAIN MEMORY may be very very large CACHE cache size at most ½ user’s main memory SPAMMER

path-following approach [Dwork-Goldberg-Naor Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) INPUT message m, sender S, receiver R, date/time d sender makes a sequence of random memory accesses into T inherently sequential (hence path-following)  sends a proof of having done so to the receiver  verification requires only a small number of accesses  memory access pattern leads to many cache misses

path-following approach [Dwork-Goldberg-Naor Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) INPUT message m, sender S, receiver R, date/time d OUTPUT attach to (m, S, R, d) the successful trial number k and H 3 (A) COMPLEXITY sender:  (2 e L) memory fetches; verification: O(L) fetches 1. Repeat for k = 1, 2, … 2. Initialize: A = H 0 (m,S,R,d,k) 3. Main Loop: Walk for L steps:  c  H 1 (A)  A  H 2 (A,T[c]) 4. Success: last e bits of H 3 (A) are 0’s SPAMMER needs 2 e walks each walk requires L/2 fetches

using the dag [idea] fix any execution: if many mid-level nodes are pebbled, use D 1 otherwise, use fault- tolerant property of D 2 CONSTRUCTION dag D composed of D 1 & D 2  D 1 has the property that pebbling many outputs requires many pebbles  more than cache and pages brought from memory can supply  stack of superconcentrators [Lengauer Tarjan 82]  D 2 is a fault-tolerant layered graph  even if a constant fraction of each layer is deleted – can still embed a superconcentrator  stack of expanders [Alon Chung 88, Upfal 92]