Download presentation

Presentation is loading. Please wait.

Published byDevante Winders Modified over 2 years ago

1
Anti-Persistence or History Independent Data Structures Moni Naor Vanessa Teague Weizmann Institute Stanford

2
Why hide your history? Core dumps Losing your laptop – The entire memory representation of data structures is exposed Emailing files – The editing history may be exposed (e.g. Word) Maintaining lists of people – Sports teams, party invitees

3
Making sure that nobody learns from history A data structure has: – A legitimate interface: the set of operations allowed to be performed on it – A memory representation The memory representation should reveal no information that cannot be obtained from the legitimate interface

4
History of history independence Issue dealt with in Cryptographic and Data Structures communities Micciancio (1997): history independent trees – Motivation: incremental crypto – Based on the shape of the data structure, not including memory representation – Stronger performance model! Uniquely represented data structures – Treaps (Seidel & Aragon), uniquely represented dictionaries – Ordered hash tables (Amble & Knuth 1974)

5
More History Persistent Data Structures: possible to reconstruct all previous states of the data structure (Sarnak and Tarjan) – We want the opposite: anti-persistence Oblivious RAM (Goldreich and Ostrovsky)

6
Overview Definitions History independent open addressing hashing History independent dynamic perfect hashing – Memory Management (Union Find) Open problems

7
Precise Definitions A data structure is – history independent: if any two sequences of operations S 1 and S 2 that yield the same content induce the same probability distribution on memory representation – strongly history independent: if given any two sets of breakpoints along S 1 and S 2 s.t. corresponding points have identical contents, S 1 and S 2 induce the same probability distributions on memory representation at those points

8
Relaxations Statistical closeness Computational indistinguishability – Example where helpful: erasing Allow some information to be leaked – Total number of operations – n-history independent: identical distributions if the last n operations where identical as well Under-defined data structures: same query can yield several legitimate answers, – e.g. approximate priority queue – Define identical content: no suffix T such that set of permitted results returned by S 1 T is different from the one returned by S 2 T

9
History independence is easy (sort of) If it is possible to decide the (lexicographically) first sequence of operations that produce a certain contents, just store the result of that This gives a history independent version of a huge class of data structures Efficiency is the problem …

10
Dictionaries Operations are insert(x), lookup(x) and possibly delete(x) The content of a dictionary is the set of elements currently inserted (those that have been inserted but not deleted) Elements x U some universe Size of table/memory N

11
Goal Find a history independent implementation of dictionaries with good provable performance. Develop general techniques for history independence

12
Approaches Unique representation – e.g. array in sorted order – Yields strong history independence Secret randomness – e.g. array in random order – only history independence

13
Open addressing: traditional version Each element x has a probe sequence h 1 (x), h 2 (x), h 3 (x),... – Linear probing: h 2 (x) = h 1 (x)+1, h 3 (x) = h 1 (x)+2,... – Double hashing – Uniform hashing Element is inserted into the first free space in its probe sequence – Search ends unsuccessfully at a free space Efficient space utilization – Almost all the table can be full

14
y Open addressing: traditional version y x y x arrived before y, so move y No clash, so insert y Not history independent because later-inserted elements move further along in their probe sequence

15
History independent version At each cell i, decide elements priorities independently of insertion order Call the priority function p i (x,y). If there is a clash, move the element of lower priority At each cell, priorities must form a total order

16
Insertion x y x x p 2 (x,y)? No, so move x x y

17
Search Same as in the traditional algorithm In unsuccessful search, can quit as soon as you find a lower-priority element No deletions Problematic in open addressing anyway

18
Strong history independence Claim: For all hash functions and priority functions, the final configuration of the table is independent of the order of insertion. Conclusion: Strongly history independent

19
Proof of history independence A static insertion algorithm (clearly history independent): x3x3 x5x5 x5x5 x3x3 x6x6 x4x4 x2x2 x1x1 x5x5 x3x3 x6x6 x4x4 x2x2 x1x1 p 1 (x 2,x 1 ) so insert x 2 p 1 (x 6,x 4 ) and p 6 (x 3,x 6 ), so insert x 3 insert x 5 x2x2 x3x3 x2x2 x5x5 x2x2 x5x5 x3x3 Gather up the rejects and restart x1x1 x6x6 x4x4 p 3 (x 4,x 5 ) and p 3 (x 4,x 6 ). Insert x 4 and remove x 5 x4x4 x4x4 x1x1 x4x4 x5x5 x1x1

20
Proof of history independence Nothing moves further in the static algorithm than in the dynamic one – By induction on rounds of the static alg. Vice versa – By induction on the steps in the dynamic alg. Strongly history independent

21
Some priority functions Global – A single priority independent of cell Random – Choose a random order at each cell Youth-rules – Call an element younger if it has moved less far along its probe sequence; younger elements get higher priority

22
Youth-rules x y p 2 (x,y) because x has taken fewer steps than y x y y Use a tie-breaker if steps are equal This is a priority function

23
Specifying a scheme Priority rule – Choice of priority functions – In Youth-rules – determined by probe sequence Probe functions – How are they chosen – Maintained – Computed

24
Implementing Youth-rules Let each h i be chosen from a pair-wise independent collection – For any two x and y the r.v. h i (x) and h i (y) are uniform and independent. Let h 1, h 2, h 3, … be chosen independently Example: h i (x) = (a i · x mod U) + b i mod N Space: 2 elements per function Need only log N functions

25
Performance Analysis Based on worst-case insertion sequence The important parameter: - the fraction of the table that is used · N elements Analysis of expected insertion time and search time (number of probes to the table) – Have to distinguish successful and unsuccessful search

26
Analysis via the Static Algorithm For insertions, the total number of probes in static and dynamic algorithm are identical – Easier to analyze the static algorithm Key point for Youth-rules: in the phase i all unsettled elements are in the ith probe in their sequence – Assures fresh randomness of h i (x)

27
Performance For Youth-rules, implemented as specified: For any sequence of insertion the expected probe-time for insertion is at most 1/(1- ) For any sequence of insertion the expected probe-time for successful or unsuccessful search is at most 1/(1- ) Analysis based on static algorithm is the fraction of the table that is used

28
Comparison to double hashing Analysis of double hashing with truly random functions [Guibas & Szemeredi, Lueker & Molodowitch] Can be replaced by log n wise independent functions (Schmidt & Siegel) log n wise independent is relatively expensive: – either a lot of space or log n time Youth-rules is a simple and provably efficient scheme with very little extra storage Extra benefit of considering history independence

29
Other Priority Functions [Amble & Knuth] log(1/(1- )) for global – Truly random hash functions Experiments show about log(1/(1- )) for most priority functions tried

30
Other types of data structures Memory management (dealing with pointers) – Memory Allocation Other state-related issues

31
Dynamic perfect hashing: FKS scheme, dynamized x6x6 x4x4 x1x1 x2x2 x5x5 x3x3 s0s0 h0h0 h1h1 hkhk h s1s1 sksk Top-level table: O(n) space Low-level tables: O(n) space total. Each gets about s i 2 n elements to be inserted The h i are perfect on their respective sets. Rechoose h or some h i to maintain perfection and linear space.

32
A subtle problem: the intersection bias problem Suppose we have: – a set of states { 1, 2,...} – a set of objects {h 1, h 2,...} – a way to decide whether h i is good for j. Keep a current h as states change Change h only if it is no longer good. – Choose uniformly from the good ones for. Then this is not history independent – h is biased towards the intersection of those good for current and for previous states.

33
Dynamized FKS is not history independent Does not erase upon deletion Uses history-dependent memory allocation Hash functions (h, h 1, h 2,...) are changed whenever they cease to be good – Hence they suffer from the intersection bias problem, since they are biased towards functions that were good for previous sets of elements – Hence they leak information about past sets of elements

34
Making it history independent Use history independent memory allocation Upon deletion, erase the element and rechoose the appropriate h i. This solves the low-level intersection bias problem. Some other minor changes Solve the top-level intersection bias problem...

35
Solving the top-level intersection bias problem Can t afford a top-level rehash on every deletion Generate two potential h s 1 and 2 at the beginning – Always use the first good one – If neither are good, rehash at every deletion – If not using 1, keep a top-level table for it for easy goodness checking (likewise for 2 )

36
Proof of history independence Table s state is defined by: The current set of elements Top-level hash functions – Always the first good i, or rechosen each step Low-level hash functions – Uniformly chosen from perfect functions Arrangement of sub-tables in memory – Use history-independent memory allocation Some other history independent things

37
Performance Lookup takes two steps Insertion and deletion take expected amortized O(1) time – There is a 1/poly chance that they will take more

38
Open Problems Better analysis for youth-rules as well as other priority functions with no random oracles. Efficient memory allocation – ours is O(s log s) Separations – Between strong and weak history independence – Between history independent and traditional versions e.g. for union find Can persistence and (computational) history independence co-exist efficiently?

39
Conclusion History independence can be subtle We have two history independent hash tables: – Based on open addressing Very space efficient but no deletion – Dynamic perfect hashing Allows deletion, constant-time lookup

40
Open addressing: implementing hash functions For all i, generate random independent a i, b i h i (x) = (a i x mod U + b i ) mod N – U : size of universe; prime – N : size of hash table x s probe sequence is h 1 (x), h 2 (x), h 3 (x),... We need log n hash functions – n is the number of elements in the table

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google