Fault Tolerance and Replication

Fault Tolerance and Replication
This power point presentation has been adapted from: (1) web.njit.edu/~gblank/cis633/Lectures/Replication.ppt

Content Introduction System model and the role of group communication
Fault tolerant services Case study: Bayou and Coda Transaction with replicated data

Introduction Replication Duplicate limited or heavily loaded resources
to provide access and ensure access after failures Replication is important for performance enhancement, increased availability and fault tolerance.

Introduction Replication Performance enhancement
Data are replicated between several originating servers in the same domain The workload is shared between the servers by binding all the server IP addresses to the site’s DNS name It increases performance with little cost to the system

Introduction Replication Increased availability
Replication is a technique for automatically maintaining the availability of data despite server failures If data are replicated at two or more failure-independent servers, then client software may be able to access data at an alternative server should the default server fail or become unreachable

Introduction Replication Fault tolerance
Highly available data is not necessarily providing correct data (may be out of date) A fault-tolerant service always guarantees the correctness of the freshness of data supplied to the client and the effects of the client’s operations upon the data

Introduction Replication Replication requirements: Transparency
Users should not need to be aware that data is replicated, and the performance and utility of the information retrieval should not be noticeably different from unreplicated data Consistency Different copies of replicated data should be the same. When data are changed, it is distributed to all replicated servers

System Model & The Role of Group Communication
Introduction The data in the system are composed of objects (e.g.,files, components, Java objects, etc.) Each logical object is implemented by a collection of physical objects called replicas, each stored on a computer. The replicas of a given object are not necessarily identical, at least not at any particular point in time. Some replicas may have received updates that others have not received.

Replica Managers (RM) components that contain the objects on a particular computer and perform operations on them. Front ends (FE) Components that handle client’s requests communicate with one or more of the replica managers by message passing A front end may be implemented in the client’s address space, or it may be a separate process

5 phases in the a request upon replicated objects [Wiesmann et al. 2000] Front end requests service from one or more RMs which may communicate with the other RMs. The front end may communicate through one RM or multicast to all of them. RMs coordinate to prepare to execute the request. This may require ordering of the operations. RMs execute the request (may be reversible later). RMs reach agreement on effect of the request. One or more RMs pass a response back to the front end.

RM in group communication is complex, especially in the case of dynamic groups. A group membership service may be used to manage the addition and removal of replica managers, and detect and recover from crashes and faults.

Tasks of a Group Membership Service Provide an interface for group membership changes Implement a failure detector Notify members of group membership changes Perform group address expansion for multicast delivery of messages.

Join Group address expansion Multicast communication send Fail Group membership management Leave Process group

Fault Tolerant Services
Introduction Replicating data and functionality at replica managers can be used to provide a service that is correct despite process failures A replication service is correct if it keeps responding despite faults Clients can’t see the difference between a service provided by replication and one with a single copy of the data.

Introduction A criteria for replicated objects is linearizable Every operation is synchronous Clients must wait for one operation to complete before starting another. A replicated shared object is sequentially consistent if for any execution interleaved operations produce a single correct copy and the order of the operations is consistent with the order in which they were performed

Update process Read-only requests have no impact on the replicated object Update processes may need to managed properly to avoid inconsistency. A strategy to avoid inconsistency Make all updates to a primary copy of the data and copy that to the other replicas (passive replication). If the primary fails, one of the backups is promoted to act as primary.

Passive (primary-backup) replication

Passive (primary-backup) replication The sequence of events when a client requests an operation Request: front end issues a request with a unique identifier to the primary replica manager. Coordination: primary processes request atomically, checking ID for duplicate requests. Execution: request is processed and stored. Agreement: if an update, primary sends info to backups, which update and acknowledge. Response: primary notifies front end, which passes information to client.

Passive (primary-backup) replication It gives fault tolerance at a cost in performance. high overhead to updating the replicas, so it gives lower performance than non-replicated objects. To solve this issue: Allow read-only requests to be made to backup RMs, but send all updates to the primary. Limited value for transaction processing systems but is very effective for decision support systems (mostly read-only requests).

Active Replication

Active Replication Active Replication steps: Request: front end attaches unique ID to request and multicasts (totally ordered, reliable) to RMs. Front end is assumed to fail only by crashing. Coordination: every correct RM receives request in same total order. Execution: every RM executes the request. Coordination: (not required due to multicast) Response: each RM sends response to front end, which manages responses depending on failure assumptions and multicast algorithm.

Active Replication The model assumes totally ordered and reliable multicasting. This is equivalent to solving consensus, which requires either a synchronous system or a technique such as failure detectors in an asynchronous system. The model can be simplified if updates are assumed to be commutative, so that the effect of two operations is the same in any order. E.g. A bank account—daily deposits and withdrawals can be done in any order unless the balance goes below zero. If a process avoids overdrafts, the effects are commutative.

Case study: Bayou and Coda
Introduction Implementation of replication techniques to make services highly available Giving clients access to the service (with reasonable response times) Fault tolerant systems send updates and all correct RMs receive updates as soon as possible. May be unacceptable for high availability systems. May be desirable to increase performance by providing slower (but still acceptable) updates with a minimal set of RMs. Weaker consistency tends to require less agreement and provides more availability.

Is an approach to high availability Users working in a disconnected fashion can make any updates in any partition at any time, with the updates recorded at any replica manager. The replica managers are required to detect and manage conflicts at the time when two partitions are rejoined and the updates are merged. Domain specific policies, called operational transformations, are used to resolve conflicts by giving priority to some partitions.

Bayou holds state values in a database to support queries and updates. Updates are a special case of a transaction, using the equivalent of a stored procedure to guarantee the ACID properties. Eventually every RM gets the same set of updates and applies them so that their databases are identical. However, since this is delayed, in an active system with a consistent stream of updates the databases may never really be identical.

Bayou Update Resolution Updates are marked as tentative when they are first applied to a database. Once coordination with the other RMS makes it possible to resolve conflicts and place the updates in a canonical order, they are committed. Once committed, they remain applied in their allotted order. Usually, this is achieved by designating a primary RM. Every update includes a dependency check and follows a merge procedure.

In Bayou, replication is not transparent to the application. Knowledge of the application semantics is required to increase data availability while maintaining a replication state that can be called eventually sequentially consistent. Disadvantages include increased complexity for the application programmers and the users. The operational transformation approach is particularly suited for groupware, where workers access documents remotely.

The Coda file system is a descendent of Andrew File System (AFS) To address several requirements that AFS does not meet – particularly the requirement to provide high availability despite disconnected operation It was developed in a research project at Carnegie-Mellon University Increasing users of AFS that use laptop: A need to support disconnected use of replicated data and to increase performance and availability.

The Coda architecture: Coda has Venus processes at the client computers and Vice processes at the file servers. The Vice processes are replica managers. A set of servers holding replicas of a file volume is a volume storage group (VSG). Clients access a subset known as the available volume storage group (AVSG), which varies as servers are connected or disconnected. Updates are distributed by broadcasting to the AVSG after a close. If the AVSG is empty (disconnected operation) files are cached until reconnected.

Coda uses an optimistic replication strategy files can be updated when the network is partitioned or during disconnected operation. A Coda version vector (CVV) is a timestamp that is used at each site to determine whether there are any conflicts among updates at the time of reconnection. If no conflict, updates are performed. Coda does not attempt to resolve conflicts. If there is a conflict, the file is marked inoperable, and the owner of the file is notified. This is done at the AVSG level, so conflicts may recur at the VSG level.

Transaction with Replicated Data
Introduction Client should see that transactions on replicated objects should appear the same as on non-replicated objects Client transactions are interleaved in a serially equivalent manner. One-copy serializability: If replicated object transactions are performed and the result is the similar as on a single set of objects

Introduction 3 replication schemes for network partition: Available copies with validation Available copies replication is applied in each partition. When a partition is repaired, a validation procedure is applied and any inconsistencies are dealt with. Quorum consensus: A subgroup must have a quorum (has sufficient members) in order to be allowed to continue providing a service in the presence of a partition. When a partition is repaired (and when a replica manager restarts after a failure), replica managers get their objects up-to-date by means of recovery procedures. Virtual partition: A combination of quorum consensus and available copies. If a virtual partition has a quorum, it can use available copies replication.

Available copies Allows for some RMs to be unavailable. Updates must be made to all available replicas of the data, with provisions to restore and update a RM that has crashed.

Available copies

Available copies with validation An optimistic approach that allows updates in different partitions of a network. When the partition is corrected, conflicts must be detected and compensating actions must be taken. This approach is limited to situations in which such compensation is possible.

Quorum consensus Is a pessimistic approach to replicated transactions. A quorum is a subgroup of RMs that is large enough to give it the right to carry out transactions even if some RMs are not available. This limits updates to a single subset of the RMs, which update other RMs after a partition is corrected. Gifford’s File Replication: a Quorum scheme in which a number of votes is assigned to each copy of a replicated file. A certain number of votes are required for either read or update operations, with writes limited to subsets of more than half the RMs. The rest of the RMs will be updated as a background task when they are available. Copies of data without enough read votes are considered weak copies and may be read locally with limits assumed on their currency and quality.

Virtual Partition Algorithm This approach combines Quorum Consensus to handle partitions and Available Copies for faster read operations. A virtual partition is an abstraction of a real partition and contains a set of replica managers.

Virtual Partition Algorithm

Virtual Partition Algorithm Issues: If network partitions are intermittent, different virtual partitions can form: Overlapping virtual partitions violate one-copy serializability. Higher logical timestamps determine the selection of consistent virtual partitions where partitions are uncommon.

Virtual Partition Algorithm

End of the Chapter …

Fault Tolerance and Replication

Similar presentations

Presentation on theme: "Fault Tolerance and Replication"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fault Tolerance and Replication

Similar presentations

Presentation on theme: "Fault Tolerance and Replication"— Presentation transcript:

Similar presentations

About project

Feedback