Reconciling Differences: towards a theory of cloud complexity George Varghese UCSD, visiting at Yahoo! Labs 1.

Slides:



Advertisements
Similar presentations
Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.
Advertisements

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Hashing.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Peeling Arguments Invertible Bloom Lookup Tables and Biff Codes
Applied Algorithmics - week7
Henry C. H. Chen and Patrick P. C. Lee
Computer Networking Error Control Coding
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
COS 461 Fall 1997 COS 461: Networks and Distributed Computing u Prof. Ed Felten u u.
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
COMP 553: Algorithmic Game Theory Fall 2014 Yang Cai Lecture 21.
Information and Coding Theory
Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese.
D.J.C MacKay IEE Proceedings Communications, Vol. 152, No. 6, December 2005.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
© 2004 Goodrich, Tamassia Hash Tables1  
What’s the Difference? Efficient Set Reconciliation without Prior Context Frank Uyeda University of California, San Diego David Eppstein, Michael T. Goodrich.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
188, , , Adding values Insertion grading Subtract 5 points for each.
Session 4 Asymmetric ciphers.
Bloom Filters Kira Radinsky Slides based on material from:
Spring 2002CS 4611 Outline Encoding Framing Error Detection Sliding Window Algorithm Point-to-Point Links.
Dictionaries and Hash Tables1  
Informed Content Delivery Across Adaptive Overlay Networks J. Byers, J. Considine, M. Mitzenmacher and S. Rost Presented by Ananth Rajagopala-Rao.
Lecture 11: Data Synchronization Techniques for Mobile Devices © Dimitre Trendafilov 2003 Modified by T. Suel 2004 CS623, 4/20/2004.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
1 Outline Encoding Framing Error Detection Sliding Window Algorithm Point-to-Point Links.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
RAPTOR CODES AMIN SHOKROLLAHI DF Digital Fountain Technical Report.
Hash Tables1 Part E Hash Tables  
Lecture 10: Search Structures and Hashing
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
1 Verification Codes Michael Luby, Digital Fountain, Inc. Michael Mitzenmacher Harvard University and Digital Fountain, Inc.
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Lecture 2 We have given O(n 3 ), O(n 2 ), O(nlogn) algorithms for the max sub-range problem. This time, a linear time algorithm! The idea is as follows:
1 Solid State Storage (SSS) System Error Recovery LHO 08 For NASA Langley Research Center.
Chapter 13 File Structures. Understand the file access methods. Describe the characteristics of a sequential file. After reading this chapter, the reader.
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
1 Lecture 11: Bloom Filters, Final Review December 7, 2011 Dan Suciu -- CSEP544 Fall 2011.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Shifted Codes Sachin Agarwal Deutsch Telekom A.G., Laboratories Ernst-Reuter-Platz Berlin Germany Joint work with Andrew Hagedorn and Ari Trachtenberg.
ICOM 6115©Manuel Rodriguez-Martinez ICOM 6115 – Computer Networks and the WWW Manuel Rodriguez-Martinez, Ph.D. Lecture 14.
Hash Tables1   © 2010 Goodrich, Tamassia.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Hashing is a method to store data in an array so that sorting, searching, inserting and deleting data is fast. For this every record needs unique key.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Bloom Filters. Lecture on Bloom Filters Not described in the textbook ! Lecture based in part on: Broder, Andrei; Mitzenmacher, Michael (2005), "Network.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Ch. 2: Getting Started. 2 About this lecture Study a few simple algorithms for sorting – Insertion Sort – Selection Sort (Exercise) – Merge Sort Show.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
SketchVisor: Robust Network Measurement for Software Packet Processing
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
The Variable-Increment Counting Bloom Filter
mEEC: A Novel Error Estimation Code with Multi-Dimensional Feature
Lecture 6: Counting triangles Dynamic graphs & sampling
5a + 3 = 2a a = 2a a = 2a + 9 5a – 2a = 9 3a = 9 a = 9
Solving Linear Equations
Presentation transcript:

Reconciling Differences: towards a theory of cloud complexity George Varghese UCSD, visiting at Yahoo! Labs 1

2 Part 1: Reconciling Sets across a link Joint with D. Eppstein, M. Goodrich, F. Uyeda Appeared in SIGCOMM 2011

Motivation 1: OSPF Routing (1990) After partition forms and heals, R1 needs updates at R2 that arrived during partition. 3 R1 R2 Must solve the Set-Difference Problem! Partition heals

Motivation 2:Amazon S3 storage (2007) Synchronizing replicas. 4 S1 S2 Set-Difference across cloud again! Periodic Anti-entropy Protocol between replicas

What is the Set-Difference problem? What objects are unique to host 1? What objects are unique to host 2? A A Host 1Host 2 C C A A F F E E B B D D F F 5

Use case 1: Data Synchronization Identify missing data blocks Transfer blocks to synchronize sets A A Host 1Host 2 C C A A F F E E B B D D F F D D C C B B E E 6

Use case 2: Data De-duplication Identify all unique blocks. Replace duplicate data with pointers A A Host 1Host 2 C C A A F F E E B B D D F F 7

Prior work versus ours Trade a sorted list of keys. – Let n be size of sets, U be size of key space – O(n log U) communication, O(n log n) computation – Bloom filters can improve to O(n) communication. Polynomial Encodings (Minsky,Trachtenberg) – Let “d” be the size of the difference – O(d log U) communication, O(dn+d 3 ) computation Invertible Bloom Filter (our result) – O(d log U) communication, O(n+d) computation 8

Difference Digests Efficiently solves the set-difference problem. Consists of two data structures: – Invertible Bloom Filter (IBF) Efficiently computes the set difference. Needs the size of the difference – Strata Estimator Approximates the size of the set difference. Uses IBF’s as a building block. 9

IBFs: main idea Sum over random subsets: Summarize a set by “checksums” over O(d) random subsets. Subtract: Exchange and subtract checksums. Eliminate: Hashing for subset choice  common elements disappear after subtraction Invert fast: O(d) equations in d unknowns; randomness allows expected O(d) inversion. 10

“Checksum” details Array of IBF cells that form “checksum” words – For set difference of size d, use αd cells (α > 1) Each element ID is assigned to many IBF cells Each cell contains: 11 idSumXOR of all IDs assigned to cell hashSumXOR of hash(ID) of IDs assigned to cell countNumber of IDs assigned to cell

IBF Encode A A idSum ⊕ A hashSum ⊕ H(A) count++ idSum ⊕ A hashSum ⊕ H(A) count++ idSum ⊕ A hashSum ⊕ H(A) count++ idSum ⊕ A hashSum ⊕ H(A) count++ idSum ⊕ A hashSum ⊕ H(A) count++ idSum ⊕ A hashSum ⊕ H(A) count++ Hash1 Hash2 Hash3 B B C C Assign ID to many cells 12 IBF: αd “Add” ID to cell Not O(n), like Bloom Filters! All hosts use the same hash functions

Invertible Bloom Filters (IBF) Trade IBF’s with remote host A A Host 1Host 2 C C A A F F E E B B D D F F IBF 2 IBF 1 13

Invertible Bloom Filters (IBF) “Subtract” IBF structures – Produces a new IBF containing only unique objects A A Host 1Host 2 C C A A F F E E B B D D F F IBF 2 IBF 1 IBF (2 - 1) 14

IBF Subtract 15

Disappearing act After subtraction, elements common to both sets disappear because: – Any common element (e.g W) is assigned to same cells on both hosts (same hash functions on both sides) – On subtraction, W XOR W = 0. Thus, W vanishes. While elements in set difference remain, they may be randomly mixed  need a decode procedure. 16

IBF Decode 17 H(V ⊕ X ⊕ Z) ≠ H(V) ⊕ H(X) ⊕ H(Z) H(V ⊕ X ⊕ Z) ≠ H(V) ⊕ H(X) ⊕ H(Z) Test for Purity: H( idSum ) Test for Purity: H( idSum ) H( idSum ) = hashSum H(V) = H(V) H( idSum ) = hashSum H(V) = H(V)

IBF Decode 18

IBF Decode 19

IBF Decode 20

21 Small Diffs: 1.4x – 2.3x Large Differences: 1.25x - 1.4x How many IBF cells? Space Overhead Set Difference Hash Cnt 3 Hash Cnt 4 Overhead to decode at >99% α

How many hash functions? 1 hash function produces many pure cells initially but nothing to undo when an element is removed. 22 A A B B C C

How many hash functions? 1 hash function produces many pure cells initially but nothing to undo when an element is removed. Many (say 10) hash functions: too many collisions. 23 A A A A B B C C B B C C A A A A B B B B C C C C

How many hash functions? 1 hash function produces many pure cells initially but nothing to undo when an element is removed. Many (say 10) hash functions: too many collisions. We find by experiment that 3 or 4 hash functions works well. Is there some theoretical reason? 24 A A A A B B C C C C A A B B B B C C

Theory Let d = difference size, k = # hash functions. Theorem 1: With (k + 1) d cells, failure probability falls exponentially with k. – For k = 3, implies a 4x tax on storage, a bit weak. [Goodrich,Mitzenmacher]: Failure is equivalent to finding a 2-core (loop) in a random hypergraph Theorem 2: With c k d, cells, failure probability falls exponentially with k. – c 4 = 1.3x tax, agrees with experiments 25

26 Large Differences: 1.25x - 1.4x Recall experiments Space Overhead Set Difference Hash Cnt 3 Hash Cnt 4 Overhead to decode at >99%

Connection to Coding Mystery: IBF decode similar to peeling procedure used to decode Tornado codes. Why? Explanation: Set Difference is equivalent to coding with insert-delete channels Intuition: Given a code for set A, send checkwords only to B. Think of B as a corrupted form of A. Reduction: If code can correct D insertions/deletions, then B can recover A and the set difference. 27 Reed Solomon Polynomial Methods LDPC (Tornado) Difference Digest Reed Solomon Polynomial Methods LDPC (Tornado) Difference Digest

Random Subsets  Fast Elimination 28 Sparse X + Y + Z =.. αd X =.. Y =.. Pure Roughly upper triangular and sparse

Difference Digests Consists of two data structures: – Invertible Bloom Filter (IBF) Efficiently computes the set difference. Needs the size of the difference – Strata Estimator Approximates the size of the set difference. Uses IBF’s as a building block. 29

Strata Estimator A A Consistent Partitioning Consistent Partitioning B B C C 30 ~1/2 ~1/4 ~1/8 1/16 IBF 1 IBF 4 IBF 3 IBF 2 Estimator Divide keys into sampled subsets containing ~1/2 k Encode each subset into an IBF of small fixed size – log(n) IBF’s of ~20 cells each

4x Strata Estimator 31 IBF 1 IBF 4 IBF 3 IBF 2 Estimator 1 Attempt to subtract & decode IBF’s at each level. If level k decodes, then return: 2 k x (the number of ID’s recovered) … IBF 1 IBF 4 IBF 3 IBF 2 Estimator 2 … Decode Host 1 Host 2

KeyDiff Service Promising Applications: – File Synchronization – P2P file sharing – Failure Recovery Key Service Application Add( key ) Remove( key ) Diff( host1, host2 ) 32

Difference Digest Summary Strata Estimator – Estimates Set Difference. – For 100K sets, 15KB estimator has <15% error – O(log n) communication, O(n) computation. Invertible Bloom Filter – Identifies all ID’s in the Set Difference. – 16 to 28 Bytes per ID in Set Difference. – O(d) communication, O(n+d) computation – Worth it if set difference is < 20% of set sizes 33

Connection to Sparse Recovery? If we forget about subtraction, in the end we are recovering a d-sparse vector. Note that the hash check is key for figuring out which cells are pure after differencing. Is there a connection to compressed sensing. Could sensors do the random summing? The hash summing? Connection the other way: could use compressed sensing for differences? 34

Comparison with Information Theory and Coding Worst case complexity versus average It emphasize communication complexity not computation complexity: we focus on both. Existence versus Constructive: some similar settings (Slepian-Wolf) are existential Estimators: We want bounds based on difference and so start by efficiently estimating difference. 35

Aside: IBFs in Digital Hardware 36 a, b, x, y Stream of set elements Logic (Read, hash, Write) Bank 1Bank 2 Bank 3 Hash 1 Hash 2 Hash 3 Hash to separate banks for parallelism, slight cost in space needed. Decode in software Strata Hash

37 Part 2: Towards a theory of Cloud Complexity ? O1 O3 O2 Complexity of reconciling “similar” objects?

38 Example: Synching Files ? Measures: Communication bits, computation X.ppt.v3 X.ppt.v2 X.ppt.v1

39 So far: Two sets, one link, set difference {a,b,c} {d,a,c}

40 Mild Sensitivity Analysis: One set much larger than other ? Set A Set B Small difference d (|A|) bits needed, not O (d) : Patrascu 2008 Simpler proof: DKS 2011 (|A|) bits needed, not O (d) : Patrascu 2008 Simpler proof: DKS 2011

41 Asymmetric set difference in LBFS File System (Mazieres) ? File A Chunk Set B at Server 1 chunk difference LBFS sends all chunk hashes in File A: O|A| C1 C2 C3 C97 C98 C99 C1 C5 C3 C97 C98 C99... File B

42 More Sensitivity Analysis: small intersection: database joins ? Set A Set B Small intersection d (|A|) bits needed, not O (d) : Follows from results on hardness of set disjointness

43 Sequences under Edit Distance (Files for example) ? File A File B Edit distance 2 Insert/delete can renumber all file blocks... A B C D E F A C D E F G

44 Sequence reconciliation (with J. Ullman) File A File B Edit distance 1 Send 2d+1 piece hashes. Clump unmatched pieces and recurse. O( d log (N) ) A B C D E F A C D E F H1 H2 H3 H2 H3 2

21 years of Sequence Reconciliation! Schwartz, Bowdidge, Burkhard (1990): recurse on unmatched pieces, not aggregate. Rsync: widely used tool that breaks file into roughly piece hashes, N is file length. 45 UCSD, Lunch Princeton, kids

46 Sets on graphs? {a,b,c} {d,c,e} {b,c,d} {a,f,g}

47 Generalizes rumor spreading which has disjoint singleton sets {a} {d} {b} {g} CLP10,G11,: O( E n log n /conductance)

48 Generalized Push-Pull (with N. Goyal and R. Kannan) {a,b,c} {d,c,e} {b,c,d} Pick random edge Do 2 party set reconciliation Complexity: C + D, C as before, D = Sum (U – S ) i i

49 Sets on Steiner graphs? {a} U S {b} U S R1 Only terminals need sets. Push-pull wasteful!

Butterfly example for Sets 50 S2 S1 D = Diff(S1,S2) S2 D D Set difference instead of XOR within network S1 X Y

How does reconciliation on Steiner graphs relate to network coding? Objects in general, not just bits. Routers do not need objects but can transform/code objects. What transformations within network allow efficient communication close to lower bound? 51

52 Sequences with d mutations: VM code pages (with Ramjee et al) ? VM A VM B 2 “errors” Reconcile Set A = {(A,1)(B,2),(C,3),(D,4),(E,5)} and Set B = {(A,1),(X,2),(C,3),(D,4),(Y,5)} A B C D E A X C D Y

Twist: IBFs for error correction? (with M. Mitzenmacher) Write message M[1..n] of n words as set S = {(M[1],1), (M[2], 2),.. (M[n], n)}. Calculate IBF(S) and transmit M, IBF(S) Receiver uses received message M’ to find IBF(S’); subtracts from IBF’(S) to locate errors. Protect IBF using Reed-Solomon or redundancy Why: Potentially O(e) decoding for e errors -- Raptor codes achieve this for erasure channels. 53

The Cloud Complexity Milieu 54 2 Node GraphSteiner Nodes Sets (Key,values)EGUV11GKV11? Sequence, Edit Distance (Files) SBB90?? Sequence, errors only (VMs) MV11 ?? Sets of sets (database tables) ??? Streams (movies)??? Other dimensions: approximate, secure,...

Conclusions: Got Diffs? Resiliency and fast recoding of random sums  set reconciliation; and error correction? Sets on graphs – All terminals: generalizes rumor spreading – Routers,terminals: resemblance to network coding. Cloud complexity: Some points covered, many remain Practical, may be useful to synch devices across cloud. 55

Comparison to Logs/Incremental Updates IBF work with no prior context. Logs work with prior context, BUT – Redundant information when sync’ing with multiple parties. – Logging must be built into system for each write. – Logging adds overhead at runtime. – Logging requires non-volatile storage. Often not present in network devices. 56 IBF’s may out-perform logs when: Synchronizing multiple parties Synchronizations happen infrequently IBF’s may out-perform logs when: Synchronizing multiple parties Synchronizations happen infrequently