Computing in the RAIN: A Reliable Array of Independent Nodes Group A3 Ka Hou Wong Jahanzeb Faizan Jonathan Sippel.

Slides:



Advertisements
Similar presentations
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
Advertisements

Distributed Processing, Client/Server and Clusters
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Chapter 19: Network Management Business Data Communications, 5e.
Abstract HyFS: A Highly Available Distributed File System Jianqiang Luo, Mochan Shrestha, Lihao Xu Department of Computer Science, Wayne State University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Reliability on Web Services Presented by Pat Chan 17/10/2005.
Distributed Processing, Client/Server, and Clusters
Distributed components
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE.
1 Chapter 9 Computer Networks. 2 Chapter Topics OSI network layers Network Topology Media access control Addressing and routing Network hardware Network.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
1 Subnets. 2 Token Ring Access Methodology = token passing Logical topology = sequential Physical = traditionally ring, currently star –Logical ring,
Error Checking continued. Network Layers in Action Each layer in the OSI Model will add header information that pertains to that specific protocol. On.
COMPUTER NETWORKS.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Protocol Architectures. Simple Protocol Architecture Not an actual architecture, but a model for how they work Similar to “pseudocode,” used for teaching.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Distributed Systems: Concepts and Design Chapter 1 Pages
1 Next Few Classes Networking basics Protection & Security.
ARMADA Middleware and Communication Services T. ABDELZAHER, M. BJORKLUND, S. DAWSON, W.-C. FENG, F. JAHANIAN, S. JOHNSON, P. MARRON, A. MEHRA, T. MITTON,
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
SRL: A Bidirectional Abstraction for Unidirectional Ad Hoc Networks. Venugopalan Ramasubramanian Ranveer Chandra Daniel Mosse.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Coupling Facility. The S/390 Coupling Facility (CF), the key component of the Parallel Sysplex cluster, enables multisystem coordination and datasharing.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
WINDOWS SERVER 2003 Genetic Computer School Lesson 12 Fault Tolerance.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Group Communication Theresa Nguyen ICS243f Spring 2001.
SYSTEM ADMINISTRATION Chapter 2 The OSI Model. The OSI Model was designed by the International Standards Organization (ISO) as a structural framework.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Seminar On Rain Technology
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
rain technology (redundant array of independent nodes)
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
Unit OS10: Fault Tolerance
CHAPTER 3 Architectures for Distributed Systems
Storage Virtualization
Fault Tolerance Distributed Web-based Systems
Distributed computing deals with hardware
UNIT IV RAID.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Introduction To Distributed Systems
Presentation transcript:

Computing in the RAIN: A Reliable Array of Independent Nodes Group A3 Ka Hou Wong Jahanzeb Faizan Jonathan Sippel

Introduction Presenter: Ka Hou Wong

Introduction RAIN Research collaboration between Caltech and Jet Propulsion Laboratory Goal Identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components

Hardware Platform Heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces through a network of switches C0C0 C1C1 C2C2 C3C3 C4C4 S0S0 S1S1 C5C5 C6C6 C7C7 C8C8 C9C9 S2S2 S3S3 C = Computer S = Switch

Software Platform Collection of software modules that run in conjunction with operating system services and standard network protocols Network Connections ApplicationMPI/PVM TCP/IP RAIN EthernetMyrinetATMServernet

Key Building Blocks For Distributed Computer Systems Communication Fault-tolerant communication topologies Reliable communication protocols Fault Management Group membership techniques Storage Distributed data storage schemes based on error-control codes

Features of RAIN Communication Provides fault tolerance in the network via the following mechanisms Bundled interfaces Link monitoring Fault-tolerant interconnect topologies

Features of RAIN (cont’d) Group membership Identifies healthy nodes that are participating in the cluster Data storage Uses redundant storage schemes over multiple disks for fault tolerance

Communication Presenter: Jahanzeb Faizan

Communication Fault-tolerant interconnect topologies Network interfaces

Fault-tolerant Interconnect Technologies Goal To connect computer nodes to a network of switches in order to maximize the network’s resistance to partitioning S S S SS S S S C C C C C C C C How do you connect n nodes to a ring of n switches?

Naïve Approach Connect the computer nodes to the nearest switches in a regular fashion S S S SS S S S C C C C C C C C 1-fault-tolerant The network is easily partitioned with two switch failures

Diameter Construction Approach Connect computer nodes to the switching network in the most non-local way possible Computer nodes are connected to maximally distant switches Nodes of degree 2 connected between switches should form a diameter

Diameter Construction Approach (cont’d) Construction (Diameters). Let d s = 4 and d c = 2.  i, 0 < i < n, label all compute nodes c i and switches s i. Connect switch s i to s (i+1)mod n, i.e., in a ring. Connect node c i to switches s i and s (i+  n/2  +1)mod n. S0S0 S1S1 S6S6 S2S2 S4S4 S3S3 S5S5 C0C0 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 n = 7 S0S0 S1S1 S7S7 S6S6 S2S2 S4S4 S3S3 S5S5 C2C2 C1C1 C0C0 C7C7 C6C6 C5C5 C4C4 C3C3 n = 8 Can tolerate 3 faults of any kind without partitioning the network

Protocol for Link Failure Goal Monitoring of available paths Requirements Correctness Bounded Slack Stability

Correctness Must correctly reflect the true state of the channel Bi-directional Communication A B If one side sees timeouts… Both sides should mark the channel as being down

Bounded Slack Ensure that both have a maximum slack of n transactions Link History Time U = link up D = link down D D U D U D U U BA D D U D U D U D U D UU BA Node A sees many more transactions than node B Nodes A and B see tightly coupled views of the channel

Stability Each real channel event (i.e. time-out) should cause at most some bounded number state transactions at each endpoint

Consistent-History Protocol for Link Failures Monitor available paths in the network for proper functioning Modified Ping Protocol guarantees each side of communication channel sees the same history (bounded slack)

The Protocol Reliable Message Passing Implementation: Sliding window protocol Existing reliable communication layer not needed Reliable messaging built on top of ping messages

The Protocol (cont’d) Protocol Sending and receiving of token using reliable messaging Tokens are sent on request Consistent history maintained Sending and receiving of Ping messages using unreliable messaging Detect when link is up or down Implemented by Pings or hardware feedback

Demonstration Down t = 1 Up t = 2 Down t = 2 Down t = 0 Up t = 1 T/0 T/1 T/0 tout/1 T/1 t: token count T: token arrival event tout: time-out event trigger event / token sent Start

Group Membership Presenter: Jonathan Sippel

Group Membership Provides a level of agreement between non-faulty processes in a distributed application Tolerates permanent and transient failures in both nodes and links Based on two mechanisms Token Mechanism 911 Mechanism

Token Mechanism Nodes in the membership are ordered in a logical ring Token passed at a regular interval from one node to the next Token carries the authoritative knowledge of the membership Node updates its local membership information according to the received token

Token Mechanism (cont’d) Aggressive Failure Detection A D BC A D BC

Token Mechanism (cont’d) Conservative Failure Detection A D BC A D BC

911 Mechanism When is the 911 Mechanism used? Token Regeneration - Regenerate a token that is lost if a node or a link fails Dynamic Scalability - Add a new node to the system What is a 911 message? Request for the right to regenerate the lost token Must be approved by all the live nodes in the membership

Token Regeneration Only one node is allowed to regenerate the token Token sequence number is used to guarantee mutual exclusivity and is incremented every time the token is passed from one node to the next Each node makes a local copy of the token on receipt Sequence number on the node’s local copy of the token is added to the 911 message and compared to all the sequence numbers on the local copies of the token on the other live nodes 911 request is denied by any node with a more recent copy of the token

Dynamic Scalability 911 message sent by a new node to join the group Receiving node Treats the message as a join request because the originating node is not in the membership Updates the membership the next time it receives the token and sends it to the new node

Data Storage The RAIN system provides a distributed storage system based on a class of erasure-correcting codes called array codes that provide a mathematical means of representing data so lost information can be recovered

Data Storage (cont’d) Array codes With an (n, k) erasure-correcting code, k symbols of original data are represented with n symbols of encoded data With an m-erasure-correcting code, the original data can be recovered even if m symbols of the encoded data are lost A code is said to be Maximum Distance Separable (MDS) if m = n – k The only operations necessary to encode/decode an array code are simple binary XOR operations

Data Storage (cont’d) A+C+d+eF+B+c+dE+A+b+cD+F+a+bC+E+f+aB+D+e+f FDDCBA fedcba Data Placement Scheme for a (6, 4) Array Code

Data Storage (cont’d) A+C+d+eF+B+c+dE+A+b+cD+F+a+b?? FDDC?? fedc?? Data Placement Scheme for a (6, 4) Array Code A = C + d + e + (A + C + d + e) b = A + (E + A + b + c) + c + E a = b + (D + F + a + b) + D + F B = a + c + (F + B + c + d) + d

Data Storage (cont’d) Distributed store/retrieve operations For a store operation a block of data of size d is encoded into n symbols, each of size d/k, using an (n, k) MDS array code For a retrieve operation, symbols are collected from any k nodes and decoded The original data can be recovered with up to n – k node failures The encoding scheme provides for dynamic reconfigurability and load balancing

RAIN Contributions to Distributed Computing Systems Fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures Fault management techniques based on group membership Data storage schemes based on computationally efficient error-control codes

References Vasken Bohossian, Chenggong C. Fan, Paul S. LeMahieu, Marc D. Riedel, Lihao Xu, Jehoshua Bruck, “Computing in the RAIN: A Reliable Array of Independent Nodes,” IEEE Transactions On Parallel and Distributed Systems, Vol. 12, No. 2, February