Foundations of Privacy Lecture 9: History-Independent Hashing Schemes (and applications) Lecturer: Gil Segev.

Foundations of Privacy Lecture 9: History-Independent Hashing Schemes (and applications) Lecturer: Gil Segev

2 Election Day Carol Bob Carol Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes Alice Bob Alice Problem: Mr. Drew’s notebook leaks sensitive information First student voted for Carol Second student voted for Alice … Alice May compromise the privacy of the elections

3 Election Day Carol Alice Bob 1 1 1 1 CarolAlice Bob What about more involved applications? Write-in candidates Votes which are subsets or rankings …. A simple solution: Lexicographically sorted list of candidates Unary counters

4 Learning From History A simple example: sorted list Canonical memory representation Not really efficient... The two levels of a data structure “Legitimate” interface Memory representation History independence The memory representation should not reveal information that cannot be obtained using the legitimate interface Alice Bob Carol

5 This Talk Part 1: An efficient history-independent hashing scheme Part 2: Application of h istory independence to electronic voting

6 Part 1: An Efficient History Independent Hashing Scheme

7 HI Cuckoo Hashing A HI dictionary that simultaneously achieves the following: Efficiency: Lookup time – O(1) worst case Update time – O(1) expected amortized Memory utilization 50% ( 25% with deletions) Strongest notion of history independence Simple and fast

8 Notions of History Independence Weak history independence - WHI Memory revealed at the end of an activity period Any two sequences of operations S 1 and S 2 that lead to the same content induce the same distribution on the memory representation Strong history independence -SHI Memory revealed several times during an activity period Any two sets of breakpoints along S 1 and S 2 with the same content at each breakpoint, induce the same distribution on the memory representation at all these points Completely randomizing memory after each operation is not good enough Naor and Teague (2001) following Micciancio (1997)

9 Notions of History Independence Weak & strong are not equivalent WHI for reversible data structures is possible without a canonical representation Provable efficiency gaps [BP06] (in restricted models) We consider strong history independence Canonical representation (up to initial randomness) implies SHI Other direction shown to hold for reversible data structures [HHMPR05]

10 SHI Dictionaries Deletions Memory utilization Update time Lookup time Practical? Naor & Teague ‘01 Blelloch & Golovin ‘07 This work 99% < 9% < 25% (< 50%) O(1) expected O(1) worst case O(1) expected O(1) worst case ? (mem. util. < 50%)

11 Our Approach Cuckoo hashing [PR01]: A simple & practical scheme with worst case constant lookup time Force a canonical representation on cuckoo hashing No significant loss in efficiency Avoid rehashing!! What happens when hash functions fail? Rehashing is problematic in SHI data structures All hash functions need to be sampled in advance (theoretical problem) When an item is deleted, may need to roll back on previous functions We use a secondary storage to reduces the failure probability exponentially [KMW08]

12 Cuckoo Hashing Tables T 1 and T 2 with hash functions h 1 and h 2 Store x in one of T 1 [h 1 (x)] and T 2 [h 2 (x)] Insert(x): Greedily insert in T 1 or T 2 If both are occupied then store x in T 1 Repeat in other table with the previous occupant Y Z V T1T1 T2T2 X Z Y V T1T1 T2T2 X Successful insertion W W

13 Cuckoo Hashing Tables T 1 and T 2 with hash functions h 1 and h 2 Store x in one of T 1 [h 1 (x)] and T 2 [h 2 (x)] Y U Z V T1T1 T2T2 X Failure – rehash required Insert(x): Greedily insert in T 1 or T 2 If both are occupied then store x in T 1 Repeat in other table with the previous occupant

14 The Cuckoo Graph Set S ½ U containing n keys h 1, h 2 : U ! {1,...,r} Bipartite graph with sets of size r Edge (h 1 (x), h 2 (x)) for every x 2 S S is successfully stored Every connected component has at most one cycle Main theorem: If r ¸ (1 + ² )n and h 1,h 2 are log(n) -wise independent, then failure probability is £ (1/n)

15 The Canonical Representation Assume that S can be stored using h 1 and h 2 We force a canonical representation on the cuckoo graph Suffices to consider a single connected component Assume that S forms a tree in the cuckoo graph. Typical case One location must be empty. The choice of the empty location uniquely determines the location of all elements a b d c e Rule: h 1 ( minimal element ) is empty

16 The Canonical Representation Assume that S can be stored using h 1 and h 2 We force a canonical representation on the cuckoo graph Suffices to consider a single connected component Assume that S has one cycle Two ways to assign elements in the cycle Each choice uniquely determines the location of all elements a b d c e Rule: minimal element in cycle lies in T 1

17 The Canonical Representation Updates efficiently maintain the canonical representation Insertions: New leaf: check if new element is smaller than current min new cycle: Same component… Merging two components… All cases straight forward Update time < size of component = expected (small) constant Deletions: Find the new min, split component,… Requires connecting all elements in the component with a sorted cyclic list Memory utilization drops to 25% All cases straight forward

18 Rehashing What if S cannot be stored using h 1 and h 2 ? Happens with probability £ (1/n) Can we simply pick new functions? Rare, but very bad worst case performance “ Canonical memory ” implies we need to sample all hash functions in advance (theoretical problem) Whenever an item is deleted, need to check whether we must roll back to previous hash functions A bad item which is repeatedly inserted and deleted would cause a rehash every operation!

19 Using a Stash Whenever an insert fails, put a ‘bad’ item in a secondary data structure Bad item: smallest item that belongs to a cycle Secondary data structure must be SHI in itself Theorem [KMW08]: Pr[|stash| > s] < n -s In practice keeping the stash as a sorted list is probably the best solution Effectively the query time is constant with (very) high probability In theory the stash could be any SHI with constant lookup time A deterministic hashing scheme, where the elements are rehashed whenever the content changes [AN96, HMP01]

20 Conclusions and Problems Cuckoo hashing is a robust and flexible hashing scheme Easily ‘molded’ into a history independent data structure We don’t know how to do this for Cuckoo Hashing with more than 2 hash functions and/or more than 1 element per bucket Better memory utilization, better performance, but.. Expected size of connected component is not constant Constant worst-case operations?

21 Part 2: Application of History Independence to Electronic Voting

Secure Vote Storage Mechanisms that operate in extremely hostile environments Without a “secure” mechanism an adversary may be able to Undetectably tamper with the records Compromise privacy Possible scenarios: Poll workers may tamper with the device while in transit Malicious software embeds secret information in public output … 22

Main Security Goals Tamper-evidence Prevent an adversary from undetectably tampering with the records History-independence Memory representation does not reveal the insertion order Subliminal-freeness Information cannot be secretly embedded into the data Integrity Privacy 23

Secure Vote Storage 24 Goal: A secure and efficient mechanism for storing an increasingly growing set of K elements taken from a large universe of size N Supports Insert(x), Seal() and RetreiveAll() Cast a ballot Count votes “Finalize” the elections

25 Goal: A secure and efficient mechanism for storing an increasingly growing set of K elements taken from a large universe of size N Tamper-evidence by exploiting write-once memories Information-theoretic security Everything is public!! No need for private storage Deterministic strategy in which each subset of elements determines a unique memory representation Strongest form of history-independence Unique representation - cannot secretly embed information Our approach: Initialized to all 0 ’s Can only flip 0 ’s to 1 ’s Secure Vote Storage

26 Previous approaches were either: Inefficient (required O(K 2 ) space) Randomized (enabled subliminal channels) Required private storage Explicit Space Insertion time K  polylog(N) polylog(N) K  log(N/K) log(N/K) Non-Explicit Deterministic, history-independent and write-once strategy for storing an increasingly growing set of K elements taken from a large universe of size N Our Results Main Result

Deterministic, history-independent and write-once strategy for storing an increasingly growing set of K elements taken from a large universe of size N Our Results Main Result First explicit, deterministic and non-adaptive Conflict Resolution algorithm which is optimal up to poly-logarithmic factors Application to Distributed Computing Resolve conflicts in multiple-access channels One of the classical Distributed Computing problems Explicit, deterministic & non-adaptive -- open since ‘85 [Komlos & Greenberg] 27

Previous Work Molnar, Kohno, Sastry & Wagner ‘06 Initiated the formal study of secure vote storage Tamper-evidence by exploiting write-once memories Initialized to all 0 ’s Can only flip 0 ’s to 1 ’s Encoding(x) = (x, wt 2 (x)) Logarithmic overhead PROM Flipping any bit of x from 0 to 1 requires flipping a bit of wt 2 ( x ) from 1 to 0 28

29 Previous Work Molnar, Kohno, Sastry & Wagner ‘06 Initiated the formal study of secure vote storage Tamper-evidence by exploiting write-once memories “Copy-over list”: A deterministic & history-independent solution Problem: Cannot sort in- place on write-once memories On every insertion: Compute the sorted list including the new element Copy the sorted list to the next available memory position Erase the previous list A useful observation: Store the elements in a sorted list O(K 2 ) space!!

Previous Work Molnar, Kohno, Sastry & Wagner ‘06 Initiated the formal study of secure vote storage Tamper-evidence by exploiting write-once memories “Copy-over list”: A deterministic & history-independent solution Several other solutions which are either randomized or require private storage Bethencourt, Boneh & Waters ‘07 A linear-space cryptographic solution “History-independent append-only” signature scheme Randomized & requires private storage 30

Our Mechanism Global strategy Mapping elements to entries of a table Both strategies are deterministic, history-independent and write-once Local strategy Resolving collisions separately in each entry 31

The Local Strategy Store elements mapped to each entry in a separate copy-over list ℓ elements require ℓ 2 pre-allocated memory Allows very small values of ℓ in the worst case! Can a deterministic global strategy guarantee that? The worst case behavior of any fixed hash function is very poor There is always a relatively large set of elements which are mapped to the same entry…. 32

The Global Strategy Sequence of tables Each table stores a fraction of the elements Each element is inserted into several entries of the first table When an entry overflows: o Elements that are not stored elsewhere are inserted into the next table o The entry is permanently deleted 33

The Global Strategy Each element is inserted into several entries of the first table When an entry overflows: o Elements that are not stored elsewhere are inserted into the next table o The entry is permanently deleted Universe of size N OVERFLOW 34

The Global Strategy OVERFLOW Universe of size N Each element is inserted into several entries of the first table When an entry overflows: o Elements that are not stored elsewhere are inserted into the next table o The entry is permanently deleted 35

Each element is inserted into several entries of the first table When an entry overflows: o Elements that are not stored elsewhere are inserted into the next table o The entry is permanently deleted Universe of size N Unique representation: Elements determine overflowing entries in the first table Elements mapped to non-overflowing entries are stored Continue with the next table and remaining elements The Global Strategy 36

Subset of size K Table of size ~K Stores ® K elements Table of size ~(1- ® )K Stores ® (1 - ® )K elements Table of size ~(1- ® ) 2 K Where do the hash functions come from? Universe of size N Each element is inserted into several entries of the first table When an entry overflows: o Elements that are not stored elsewhere are inserted into the next table o The entry is permanently deleted The Global Strategy 37

Identify the hash function of each table with a bipartite graph Universe of size N S OVERFLOW LOW DEGREE The Global Strategy (K, ®, ℓ) -Bounded-Neighbor Expander: Any set S of size K contains ® K elements with a neighbor of degree · ℓ w.r.t S 38

Bounded-Neighbor Expanders Table of size M Universe of size N Given N and K, want to optimize M, ℓ, ® and the left-degree D Optimal ExtractorDisperser 1polylog(N) 1/2 M ® ℓ K ¢ log(N/K)K ¢ 2 (loglogN) 2 K 1/polylog(N) O(1) (K, ®, ℓ) -Bounded-Neighbor Expander: Any set S of size K contains ® K elements with a neighbor of degree · ℓ w.r.t S log(N/K)D2 (loglogN) 2 polylog(N) 39

Open Problems Non-amortized insertion time In our scheme insertions may have a cascading effect Construct a scheme that has bounded worst case insertion time Improved bounded-neighbor expanders The monotone encoding problem Our non-constructive solution: K  log(N)  log(N/K) bits Obvious lower bound: K  log(N/K) bits Find the minimal M such that subsets of size at most K taken from [N] can be mapped into subsets of [M] while preserving inclusions Alon & Hod ‘07: M = O(K  log(N/K)) 40

Conflict Resolution Problem: resolve conflicts that arise when several parties transmit simultaneously over a single channel Goal: schedules retransmissions such that each of the conflicting parties eventually transmits individually A party which successfully transmits halts Efficiency measure: number of steps it takes to resolve any K conflicts among N parties An algorithm is non-adaptive if the choices of the parties in each step do not depend on previous steps 41

Conflict Resolution Why require a deterministic algorithm? Radio Frequency Identification (RFID) Many tags simultaneously read by a single reader Inventory systems, product tracking,... Tags are highly constraint devices Can they generate randomness? 42

43 The Algorithm Global strategy Mapping parties to time intervals Local strategy Resolving collisions separately in each interval

44 The Local Strategy Associate each party x 2 [N] with a codeword C(x) taken from a superimposed code: Any codeword is not contained in the bit-wise or of any other ℓ-1 codewords Resolves conflicts among any ℓ parties taken from [N] Party x transmits at step i if and only if C(x) i = 1 O(ℓ 2 ¢ logN) steps using known explicit constructions

45 Sequence of phases identified with bounded-neighbor expanders Each phase contains several time slots The graphs define the active parties at each slot Resolve collisions in each slot using the local strategy Universe of size N The Global Strategy Phase 1 Phase 2 Phase 3

46 Sequence of phases identified with bounded-neighbor expanders Each phase contains several time slots The graphs define the active parties at each slot Resolve collisions in each slot using the local strategy Universe of size N The Global Strategy O(K ¢ polylog(N)) steps OVERFLOW SUCCESS

47 Further Reading Moni Naor and Vanessa Teague Anti-persistence: History-Independent Data Structures ACM Symposium on Theory of Computing (STOC), 2001. Moni Naor, Gil Segev and Udi Wieder History-Independent Cuckoo Hashing International Colloquium on Automata, Languages and Programming (ICALP), 2008. Tal Moran, Moni Naor and Gil Segev Deterministic History-Independent Strategies for Storing Information on Write-Once Memories Theory of Computing, 2009.

Foundations of Privacy Lecture 9: History-Independent Hashing Schemes (and applications) Lecturer: Gil Segev.

Similar presentations

Presentation on theme: "Foundations of Privacy Lecture 9: History-Independent Hashing Schemes (and applications) Lecturer: Gil Segev."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Foundations of Privacy Lecture 9: History-Independent Hashing Schemes (and applications) Lecturer: Gil Segev.

Similar presentations

Presentation on theme: "Foundations of Privacy Lecture 9: History-Independent Hashing Schemes (and applications) Lecturer: Gil Segev."— Presentation transcript:

Similar presentations

About project

Feedback