Lecture 11: Data Synchronization Techniques for Mobile Devices © Dimitre Trendafilov 2003 Modified by T. Suel 2004 CS623, 4/20/2004.

Slides:



Advertisements
Similar presentations
Chapter 2.7 Data management.
Advertisements

Reconciling Differences: towards a theory of cloud complexity George Varghese UCSD, visiting at Yahoo! Labs 1.
Slide 1 of 8Helsinki TSG-T WG2 # Sep ‘99 T v0.1 Discussion of Synchronisation Standards SWG2Technical Report T2-(99)669V0.1 Helsinki, FITSG-T.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
The Zebra Striped Network File System Presentation by Joseph Thompson.
Andrea D’Orazio 28 April 2010 ©
What’s the Difference? Efficient Set Reconciliation without Prior Context Frank Uyeda University of California, San Diego David Eppstein, Michael T. Goodrich.
SIA: Secure Information Aggregation in Sensor Networks Bartosz Przydatek, Dawn Song, Adrian Perrig Carnegie Mellon University Carl Hartung CSCI 7143: Secure.
WEB SECURITY. WEB ATTACK TYPES Buffer OverflowsXML InjectionsSession Hijacking Attacks WEB Attack Types.
Node-level Representation and System Support for Network Programming Jaein Jeong.
COE Data and Computer Communications Data Communications & Networking Overview.
Networking Theory (Part 1). Introduction Overview of the basic concepts of networking Also discusses essential topics of networking theory.
Incremental Network Programming for Wireless Sensors IEEE SECON 2004 Jaein Jeong and David Culler UC Berkeley, EECS.
1 The Sybil Attack John R. Douceur Microsoft Research Presented for Cs294-4 by Benjamin Poon.
Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks Torsten Suel CIS Department Polytechnic University.
Transmission Characteristics 1. Introduction (Information Interchange codes) 2. Asynchronous and Synchronous Transmissions 3. Error detection (bit errors)
Definitions, Definitions, Definitions Lead to Understanding.
Rsync: Efficiently Synchronizing Files Using Hashing By David Shao For CS 265, Spring 2004.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Multi-Party XML Synchronization over Limited Bandwidth Thomas Wilczak Prof. James Riely SE 696 Research Project Initial Presentation 5th May, 2004.
SM3121 Software Technology Mark Green School of Creative Media.
Term 2, 2011 Week 1. CONTENTS Network communications standards – Ethernet – TCP/IP Other network protocols – The standard – Wireless application.
Distributed storage for structured data
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems.
1 CMSCD1011 Introduction to Computer Audio Lecture 10: Streaming audio for Internet transmission Dr David England School of Computing and Mathematical.
Signaling and Switching Chapter 6. Objectives In this chapter, you will learn to: Define modulation and explain its four basic versions Explain the different.
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
File Organization Techniques
1 Telematics/Networkengineering Confidential Transmission of Lossless Visual Data: Experimental Modelling and Optimization.
Presentation on Osi & TCP/IP MODEL
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
1. There are different assistant software tools and methods that help in managing the network in different things such as: 1. Special management programs.
Introduction Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
UbiStore: Ubiquitous and Opportunistic Backup Architecture. Feiselia Tan, Sebastien Ardon, Max Ott Presented by: Zainab Aljazzaf.
TRICKLE: A Self-Regulating Algorithm for Code Propagation and Maintenance in Wireless Sensor Networks Philip Levis, Neil Patel, Scott Shenker and David.
2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Strong Security for Distributed File Systems Group A3 Ka Hou Wong Jahanzeb Faizan Jonathan Sippel.
Hosted by The Pros & Cons of Content Addressed Storage Arun Taneja Founder & Consulting Analyst.
Encryption Questions answered in this lecture: How does encryption provide privacy? How does encryption provide authentication? What is public key encryption?
EM401 Overview of MobiLink Synchronization Jim Graham Director of Engineering iAnywhere Solutions
A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.
Efficient Peer-to-Peer Keyword Searching 1 Efficient Peer-to-Peer Keyword Searching Patrick Reynolds and Amin Vahdat presented by Volker Kudelko.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
Virtual Machines Created within the Virtualization layer, such as a hypervisor Shares the physical computer's CPU, hard disk, memory, and network interfaces.
Use of ICT in Data Management AS Applied ICT. Back to Contents Back to Contents.
IPSec and TLS Lesson Introduction ●IPSec and the Internet key exchange protocol ●Transport layer security protocol.
Network Models.
1 ECE 526 – Network Processing Systems Design System Implementation Principles I Varghese Chapter 3.
Bluetooth Matthew Sklar CSCI 030 November 15, 2004.
Distributed Systems Lecture 5 Time and synchronization 1.
W4118 Operating Systems Instructor: Junfeng Yang.
Biometric Encryption Base RSA Algorithm Supervisor: Ass. Prof. Dr. Dang Tran Khanh Student: Dung Ngo Dinh.
2.2 Interfacing Computers MR JOSEPH TAN CHOO KEE TUESDAY 1330 TO 1530
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
RS – Reed Solomon Error correcting code. Error-correcting codes are clever ways of representing data so that one can recover the original information.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Novell iFolder Novell Academy QuickTrain. What is iFolder? Novell iFolder lets users’ files follow them anywhere A simple and secure way to access, organize.
Powerpoint Templates Data Communication Muhammad Waseem Iqbal Lecture # 07 Spring-2016.
Compression & Networking. Background  network links are getting faster and faster but  many clients still connected by fairly slow links (mobile?) 
Computer Network Collection of computers and devices connected by communications channels that facilitates communications among users and allows users.
Compression of documents
Efficient data maintenance in GlusterFS using databases
Chapter 25: Advanced Data Types and New Applications
CHAPTER 3 Architectures for Distributed Systems
CS154, Lecture 18:.
FILE ORGANIZATION.
UNIT IV RAID.
Presentation transcript:

Lecture 11: Data Synchronization Techniques for Mobile Devices © Dimitre Trendafilov 2003 Modified by T. Suel 2004 CS623, 4/20/2004

Problem Definition Given two versions of a data set on different machines, say an outdated and a current one, how can we update the outdated one with minimum communication cost? Related Problem: What if data has been changed in several machines? (How to reconcile data: difficult, application dependent)

Obvious Solutions Send the all of the current data. Compress the current data and then send it. Send only the compressed difference between the two data sets.  If the sender has both versions use a suitable delta compression tool.  What if the sender has no access to the outdated version?

Two Aspects of the Problem File Synchronization (rsync)  Update an outdated file so that it becomes identical to a current one Set Reconciliation (today)  Assume you have many small data records, but you only want to send modified records  E.g., Database with a set of 100-byte records  Unordered: order of records not important  Find which records need to be transmitted, then send the entire record  Record identified by number (hash, record ID)

Applications for Data Synchronization Synchronizing data between PDA and PC Microsoft briefcase etc. Synchronizing databases over a network Synchronizing a file system in two stages:  find which files have changed (MD5 of files)  use rsync on those that have changed

Palm Hot Sync Relies on metadata maintained on both machines. The metadata is stored in Palm DB There is one Palm DB for each application (Date Book, To Do, Address Book, etc) A record in Palm DB consist of unique id, pointer to the object, and status flag.

Palm Hot Sync Preferred mode of operation:  Fast Sync Exchange only the modified records. Works only if the synchronization is done between two machines.

Palm Hot Sync “Backup” mode of operation:  Slow Sync Copy all of the data. Used when the last synchronization was done with different machine.

Timestamps Maintain a timestamp for each record. Send only the records with timestamp greater then timestamp of the last synchronization Good for synchronization between two machines but inefficient for more

SyncML ( now part of Open Mobile Alliance) Fairly large initiative funded by Ericsson, IBM, Lotus, Matsushita, Motorola, Nokia Seeks to provide an open standard for synchronization between different platforms and devices Uses XML Based on timestamps A device stores a timestamp for each record and each device it communicates with.  N records and M devices result in N*M timestamps  Not scalable!

Intellisync Anywhere Developed by Puma Technologies. Relies on a central server Similar to Fast Sync, but each devices synchronizes only with the central server. It has a single point of failure The central server can get congested

Intellisync Anywhere Puma technologies

Characteristic Polynomial Interpolation Synchronization (CPISync) Time/bandwidth complexity depends on the number of differences. Computationally expensive – cubic in the number of differences But can be improved Computations could be done on only one of the two devices (the faster one) Works in general peer-to-peer environment

CPISync Preliminaries Each data set can be represented as a set of numbers [using hash functions]. A characteristic polynomial for a sets is: Note that for two polynomials S A and S B

CPISync Host A and B evaluate their characteristic polynomials and at the same sample points,. Host B sends to host A its evaluations The evaluations are combined at host A to compute. The zeroes in and are determined. Those are the differences!

CPISync

IPSync – Finding the Number of Differences Guess a bound. Send evaluations at k random points Verify at k points Repeat with another bound if needed. The probability for error is:

IPSync vs. Slow Sync

Taxonomy of Synchronization Techniques

More Techniques: Bloom Filters Get a bloom filter for the receivers data set Send only elements that are not found in the bloom filter.

More Techniques: Using Error Correction Codes Send error correction code for the data set The receiver, “correct the errors” in its outdated data set. Reed-Solomon Codes Decoding time depends only on the number of differences between the sets (almost linear, not cubic) But extra factor of 2 transmission