Efficient Solutions to the Replicated Log and Dictionary Problems

Slides:



Advertisements
Similar presentations
Replicated Dictionary and Log
Advertisements

CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Transaction.
Chapter 13 (Web): Distributed Databases
Distributed Systems Spring 2009
Distributed Network Control for Optical Networks Presented by, Sree Rama Nomula
CS 582 / CMPE 481 Distributed Systems
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CS 582 / CMPE 481 Distributed Systems
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Remote Backup Systems.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
A Highly Adaptive Distributed Routing Algorithm for Mobile Wireless Networks Research Paper By V. D. Park and M. S. Corson.
Introduction Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
Leader Election Algorithms for Mobile Ad Hoc Networks Presented by: Joseph Gunawan.
A Survey of Rollback-Recovery Protocols in Message-Passing Systems.
Logical Clocks n event ordering, happened-before relation (review) n logical clocks conditions n scalar clocks condition implementation limitation n vector.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
The Destination Sequenced Distance Vector (DSDV) protocol
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.
“Virtual Time and Global States of Distributed Systems”
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
XA Transactions.
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka.
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.
Fault Tolerance (2). Topics r Reliable Group Communication.
CS 6401 Intra-domain Routing Outline Introduction to Routing Distance Vector Algorithm.
Logical Clocks event ordering, happened-before relation (review) logical clocks conditions scalar clocks  condition  implementation  limitation vector.
Distributed Systems Lecture 6 Global states and snapshots 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Remote Backup Systems.
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Chapter 16: Distributed System Structures
Outline Announcements Fault Tolerance.
Intradomain Routing Outline Introduction to Routing
PERSPECTIVES ON THE CAP THEOREM
Outline Distributed Mutual Exclusion Introduction Performance measures
Outline Introduction Background Distributed DBMS Architecture
Chapter 2: Operating-System Structures
DSDV Destination-Sequenced Distance-Vector Routing Protocol
Chapter 2: Operating-System Structures
Remote Backup Systems.
Presentation transcript:

Efficient Solutions to the Replicated Log and Dictionary Problems By Gene T J Wuu and Arthur J Bernstein Sunita Gupta, COEN 317, Spring 05

Outline Introduction The model of the environment The Log Problem and its Application The Log Problem The Dictionary Problem Efficient Solutions A New Solution to the Log Problem A Efficient Solution to the Dictionary Problem Comparison with other work Some Applications

Introduction In this paper, authors propose efficient algorithms to maintain a replicated dictionary in an unreliable distributed system using log. Replicated log is used to achieve mutual consistency of replicated data in an unreliable network.

Introduction (cont’) Data replication Logs In many application, data objects are highly shared & reliability and fast access are very important. Logs Useful in achieving distributed synchronization and in gathering information about the state of the network

The model of the environment An n node connected Network with nodes N1, N2,.., Nn. A node id – an integer in [n] denote set {1,2,..n} Two kinds of operations – the communication operation send and receive, and non communication operations Events are locally atomic ( i.e. if a node crashes during the execution of an operation, there will be not effect on local data) Events are distinguished by the time and place at which they occur.

The model of the environment (Cont’) There is local clock at each node time(e) – time of event e Node(e) – node at which event e occurred Op(e) – invoked operations and its parameters <E, → > a partial ordering relation on E Partial ordering relation – events occurring at the same node are totally ordered and e1→ e2 – e1 send event and e2 – corresponding receive event

The model of the environment (Cont’) The unreliable behaviors covered by this model – lost messages, broken communication links, network partition, and failed nodes Message in transmit not allowed to change in arbitrary ways.

The Log Problem and its application Each Node maintains its own view of the log a distribution algorithm is employed to keep the view up to date.

The Log Problem Problem of finding an algorithm to maintain the log such that, given an execution < E, → >, for every event e, f → e iff fR є L(e)

The Log Problem (Cont’) f → e iff fR є L(e) --- P1 Event f happened before event e iff event record describing event f is propagated to local copy of log of Node(e) immediately after event e completes. event record – contains operation type, time type and Node Id.

The Log Problem (Cont’) To achieve P1, node exchange messages containing appropriate portion of the log. E.g. Consider Node N1 and N2 N1 only sends records to those events which have occurred at N1 since it last sent a message to N2 (updated one) Cannot achieve P1 – message delivery is not guaranteed.

The Log Problem (Cont’) Trivial solution: At each non-communication event, e, occurring at Ni, Ni inserts eR into Li, and at each send event Ni includes Li in the message.

The Dictionary Problem Dictionary – an abstraction of data objects (file directory), a data base dictionary, a recourse management table Two non-communication operations on dictionary x – delete(x) insert(x) For Uniquness of each entry – tag each item to be inserted with time stamp.

The Dictionary Problem (Cont’) Problem of finding an algorithm for maintaining the dictionary such that, given an execution < E, →>, for every event e, x є V(e) iff Cx → e and there does not exist an x-delete event g, such that g → e.

The Dictionary Problem (Cont’) x є V(e) iff Cx → e and there does not exist an x-delete event g, such that g → e --- P2 Dictionary entry x is a part of local copy of dictionary at Node(e) iff the unique event Cx which inserts x happened before event e and there is no x-delete event g before event e occurred.

The Dictionary Problem (Cont’) Obvious solution: At each event, e, such that node(e) = i, Ni computed V(e) in the following way:

Efficient solutions Problem with previous solution: Excessive communication cost Excessive computational cost Excessive storage cost

A New Solution to the Log Problem 2-Dimension Time Table(2DTT) Used by each node Each Node Ni keeps a two-dimensional time-table Ti , which corresponds to Ni's most recent knowledge of the vector clocks. Can tell how up to date other nodes are about events occurring in the network

A New Solution to the Log Problem (Cont’) Node maintains following information: A service called clocki – each reference to clocki returns an integer number greater than that returned by the last reference. 2-Dimension time table Ti – Each time-table ensures the following time- table property Ti[k, u] = t, Ni node knows that Nk node has received the records of all events which have occurred at Nu up to t.

A New Solution to the Log Problem (Cont’) To reduce communication it is desirable for a Node to know which node have received the record of a particular event. To this end a predicate HasRec, is defined as follows : HasRec(Ti;eR;k) = Ti[k;eR.node] >= eR.time. where if hasrec function is true at Ni then Nk has learned about event e. The protocol ensures that whenever a site is aware of an event (insert or delete in the dictionary), it is aware of all causally proceeding events.

An efficient Solution to the Dictionary Problem Developed to overcome excessive computational costs and excessive storage cost Each node separately maintains its view of the dictionary as well as partial log which records some events which have happened in the network. Only partial log is kept at each node.

An efficient Solution to the Dictionary Problem (Cont’) If the communication link between two node breaks or message is lost, node can still learn indirectly, of a new event using information passed through other nodes. We assume that the state of node is maintained in stable storage and data is not lost when a crash occurs.

Comparison with Other Work 1. Fishers and Michael – Solution requires a node to send its entire copy of the dictionary in each message. Solution is expensive. 2. 1- dimension time table – Each node maintains synchronization set and node sends event record to other node even though node learnt about the event. Thus SS grows unboundedly.

Comparison with Other Work 2-DTT – Deficient of approach - it is sent as a part of message => has size O(n^2) excessive storage and communication overhead. Modified algorithm: Each node stores the complete 2DTT but sends only its own row. Each node stores only its own row and a row for each of its neighbors; it sends only those rows which correspond to neighbors of the target node. Each node stores only its neighbors rows and neighbors columns; it sends only those rows to node which correspond to neighbors of that node.

Some applications uses Log solution Replicated numeric data with add-to and subtract-from operations. Detection of failure – Log is used to collect records of communication events occuring in the network.

References Partial Database Replication using Epidemic Communication by J Holliday, D Agrawal, A Abbadi Class Notes