Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rsync: Efficiently Synchronizing Files Using Hashing By David Shao For CS 265, Spring 2004.

Similar presentations


Presentation on theme: "Rsync: Efficiently Synchronizing Files Using Hashing By David Shao For CS 265, Spring 2004."— Presentation transcript:

1

2 Rsync: Efficiently Synchronizing Files Using Hashing By David Shao For CS 265, Spring 2004

3 Problem Want to synchronize with newer version of a file on a remote server Want to minimize data sent over slow network link Want to minimize (round-trip) communication latencies

4 Solution: Rsync Open source software project http://samba.anu.edu.au/rsync/ Command line driven server and client for Unix-like systems Synchronizes directories as well as files Andrew Tridgell’s Ph.D. thesis

5 Overview of How Hashing Used Can reduce amount of data sent if willing to live with a very small probability of inaccuracy Several layers of hashing—fast but less accurate and slower but almost always accurate both used

6 Ideal Case Divide files into equal-sized blocks Files are almost identical except for relatively few blocks Have almost all of the data blocks one needs—but how to know it. Receiver Sender

7 Ideal Protocol Receiver Sender Hashes of blocks Commands on how to build file

8 Sender Analyzes Own Blocks Hash Receiver Block 1 Hash Receiver Block 2 Hash Receiver Block 3 Hash Receiver Block 4 Hash Sender Block ?

9 Commands: Copy or Add COPY: If the receiver already has the data block, just tell him to copy it. ADD: If the receiver does not have a data block, send it to him. COPY cheap, ADD expensive

10 Advantage of Ideal If COPY, reduction in network traffic by factor approximately L / h, where L is the block size and h is the size of a hash of a block of size L

11 Disadvantage of Ideal Example: Edit source code, delete a comment at the beginning Blocks no longer neatly aligned

12 Compute More Hashes Sender needs to compute hash at every byte position More expensive: L times more hashes computed by sender Use weaker, faster hash to weed out

13 Ordinary Sum of Bytes Rolling-type property: sum of L bytes starting at position i+1 almost the same as sum starting at i. Subtract red, add green, yellow same Sum starting at i Sum starting at i+1

14 Disadvantage of a Simple Sum A simple sum is too symmetric Sum of “All men are mortals” is the same as “All mortals are men”

15 Weighted Sum First bytes have more weight than the tail ones—arbitrary decision 01234560123456

16 Reordering the i + 1 Sum Red part to be subtracted and the green part to be added. Yellow is same. 01234560123456

17 Further Enhancements Compute separate (MD4) signature for entire file Reconstruct new file using temporary storage so that the old version is never removed until a new one is known to be good

18 Synchronizing Directories Divide into separate receiver/generator Receiver Generator Sender

19 Summary of Hashing Used Weaker easier to compute hash with the rolling property Stronger hash (MD4) once most candidates have been weeded out Signature over entire file as a separate check


Download ppt "Rsync: Efficiently Synchronizing Files Using Hashing By David Shao For CS 265, Spring 2004."

Similar presentations


Ads by Google