Rsync: Efficiently Synchronizing Files Using Hashing By David Shao For CS 265, Spring 2004.

Slides:



Advertisements
Similar presentations
1 Data Link Protocols By Erik Reeber. 2 Goals Use SPIN to model-check successively more complex protocols Using the protocols in Tannenbaums 3 rd Edition.
Advertisements

MOSS 2007 Document Management Adam McCarthy 1 st April 2009.
Implementing Inter-VLAN Routing
Lecture 12 Layer 2 – Data Link Layer Protocols
Version Control System (Sub)Version Control (SVN).
The Zebra Striped Network File System Presentation by Joseph Thompson.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Marwan Al-Namari Week 2. ADSL : Asymmetric Digital Subscriber Line Ethernet networks - 10BASE-T - 100BASE-TX BASE-T BASE-TX (Cat5e.
Feb 25, 2003Mårten Trolin1 Previous lecture More on hash functions Digital signatures Message Authentication Codes Padding.
Database Software File Management Systems Database Management Systems.
Lecture 11: Data Synchronization Techniques for Mobile Devices © Dimitre Trendafilov 2003 Modified by T. Suel 2004 CS623, 4/20/2004.
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 22 Introduction to Computer Networks.
1 Version 3 Module 8 Ethernet Switching. 2 Version 3 Ethernet Switching Ethernet is a shared media –One node can transmit data at a time More nodes increases.
Mar 5, 2002Mårten Trolin1 Previous lecture More on hash functions Digital signatures Message Authentication Codes Padding.
Incremental Network Programming for Wireless Sensors IEEE SECON 2004 Jaein Jeong and David Culler UC Berkeley, EECS.
Background Info The UK Mirror Service provides mirror copies of data and programs from many sources all over the world. This enables users in the UK to.
EECC694 - Shaaban #1 lec #16 Spring Properties of Secure Network Communication Secrecy: Only the sender and intended receiver should be able.
DISTRIBUTED CACHE SYSTEM EE SOFTWARE LAB, TECHNION By Shamil Nisimov Dror Bohrer Supervisor : Yaron Ben Shoshan Lab Engineer : David Ilana.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
File Transfer Protocol (FTP)
Data transmission refers to the movement of data in form of bits between two or more digital devices. This transfer of data takes place via some form.
1 Chapter Overview Creating Sites and Subnets Configuring Intersite Replication Troubleshooting Active Directory Replication.
Basic Concepts of Computer Networks
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
.Net Security and Performance -has security slowed down the application By Krishnan Ganesh Madras.
Chapter 4: Managing LAN Traffic
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Midterm Review - Network Layers. Computer 1Computer 2 2.
Networked File System CS Introduction to Operating Systems.
VLAN Trunking Protocol (VTP)
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
CS3502: Data and Computer Networks Local Area Networks - 4 Bridges / LAN internetworks.
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory, Enhanced Chapter 4: Active Directory Architecture.
Chapter Two Defining Network Objects. Chapter Objectives Describe how a workstation communicates with the network, and list the software components required.
Vim Editor and Unix Command gcc compiler Computer Networks.
NetCache Architecture and Deployment Peter Danzig Network Appliance, Santa Clara, CA 元智大學 系統實驗室 陳桂慧
CS246 Data & File Structures Lecture 1 Introduction to File Systems Instructor: Li Ma Office: NBC 126 Phone: (713)
Presenters: Rezan Amiri Sahar Delroshan
Data Communications and Computer Networks Chapter 2 CS 3830 Lecture 8 Omar Meqdadi Department of Computer Science and Software Engineering University of.
Layer 2 and Switching. How Computers Communicate  In a two node flat network data can be sent without addressing.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Data Communications & Computer Networks, Second Edition1 Chapter 6 Errors, Error Detection, and Error Control.
Chapter 6: Errors, Error Detection, and Error Control Data Communications and Computer Networks: A Business User’s Approach Third Edition.
A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.
BZUPAGES.COM Presentation on TCP/IP Presented to: Sir Taimoor Presented by: Jamila BB Roll no Nudrat Rehman Roll no
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
Computer Communication & Networks Lecture 9 Datalink Layer: Error Detection Waleed Ejaz
Unit 1 Lecture 4.
TOPIC 7.0 LINUX SERVICES AND CONFIGURATION. ROOT USER Root user is called “super user” because it has power far beyond those of mortal user. As root,
WAN Transmission Media
11 WORKING WITH ACTIVE DIRECTORY SITES Chapter 3.
Project Overview CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
© N. Ganesan, Ph.D., All rights reserved. Chapter Formatting of Data for Transmission.
1 Transport Layer: Basics Outline Intro to transport UDP Congestion control basics.
Module 8 Implementing Security Using Group Policy.
I/O Software CS 537 – Introduction to Operating Systems.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.
© CGI Group Inc. User Guide Subversion client TortoiseSVN.
Problem Solving With C++ SVN ( Version Control ) April 2016.
Compression & Networking. Background  network links are getting faster and faster but  many clients still connected by fairly slow links (mobile?) 
CS Introduction to Operating Systems
Solving Real-World Problems with Wireshark
Client-Server & Peer-to-Peer Networks
User Guide Subversion client TortoiseSVN
ONLINE SECURE DATA SERVICE
Presentation transcript:

Rsync: Efficiently Synchronizing Files Using Hashing By David Shao For CS 265, Spring 2004

Problem Want to synchronize with newer version of a file on a remote server Want to minimize data sent over slow network link Want to minimize (round-trip) communication latencies

Solution: Rsync Open source software project Command line driven server and client for Unix-like systems Synchronizes directories as well as files Andrew Tridgell’s Ph.D. thesis

Overview of How Hashing Used Can reduce amount of data sent if willing to live with a very small probability of inaccuracy Several layers of hashing—fast but less accurate and slower but almost always accurate both used

Ideal Case Divide files into equal-sized blocks Files are almost identical except for relatively few blocks Have almost all of the data blocks one needs—but how to know it. Receiver Sender

Ideal Protocol Receiver Sender Hashes of blocks Commands on how to build file

Sender Analyzes Own Blocks Hash Receiver Block 1 Hash Receiver Block 2 Hash Receiver Block 3 Hash Receiver Block 4 Hash Sender Block ?

Commands: Copy or Add COPY: If the receiver already has the data block, just tell him to copy it. ADD: If the receiver does not have a data block, send it to him. COPY cheap, ADD expensive

Advantage of Ideal If COPY, reduction in network traffic by factor approximately L / h, where L is the block size and h is the size of a hash of a block of size L

Disadvantage of Ideal Example: Edit source code, delete a comment at the beginning Blocks no longer neatly aligned

Compute More Hashes Sender needs to compute hash at every byte position More expensive: L times more hashes computed by sender Use weaker, faster hash to weed out

Ordinary Sum of Bytes Rolling-type property: sum of L bytes starting at position i+1 almost the same as sum starting at i. Subtract red, add green, yellow same Sum starting at i Sum starting at i+1

Disadvantage of a Simple Sum A simple sum is too symmetric Sum of “All men are mortals” is the same as “All mortals are men”

Weighted Sum First bytes have more weight than the tail ones—arbitrary decision

Reordering the i + 1 Sum Red part to be subtracted and the green part to be added. Yellow is same

Further Enhancements Compute separate (MD4) signature for entire file Reconstruct new file using temporary storage so that the old version is never removed until a new one is known to be good

Synchronizing Directories Divide into separate receiver/generator Receiver Generator Sender

Summary of Hashing Used Weaker easier to compute hash with the rolling property Stronger hash (MD4) once most candidates have been weeded out Signature over entire file as a separate check