Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.1 FAULT TOLERANT SYSTEMS Part 12 - Networks.

Slides:



Advertisements
Similar presentations
Nilesh Agre Wedashree Jalukar Neelima Shahi Group Members.
Advertisements

Routing in a Parallel Computer. A network of processors is represented by graph G=(V,E), where |V| = N. Each processor has unique ID between 1 and N.
1 Fault-Tolerant Computing Systems #6 Network Reliability Pattara Leelaprute Computer Engineering Department Kasetsart University
Calculating Subnet Masks
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
1 Wide-Sense Nonblocking Multicast in a Class of Regular Optical Networks From: C. Zhou and Y. Yang, IEEE Transactions on communications, vol. 50, No.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]
Path Protection in MPLS Networks Ashish Gupta Design and Evaluation of Fault Tolerance Algorithms with Performance Constraints.
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.
Communication operations Efficient Parallel Algorithms COMP308.
Architecture and Real Time Systems Lab University of Massachusetts, Amherst An Application Driven Reliability Measures and Evaluation Tool for Fault Tolerant.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Examples Broadcasting and Gossiping. Broadcast on ring (Safe and Forward)
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.2.1 FAULT TOLERANT SYSTEMS Part 2 – Canonical.
1 Computer Networks Switching Technologies. 2 Switched Network Long distance transmission typically done over a network of switched nodes End devices.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Chapter 13: WAN Technologies and Routing 1. LAN vs. WAN 2. Packet switch 3. Forming a WAN 4. Addressing in WAN 5. Routing in WAN 6. Modeling WAN using.
CECS 474 Computer Network Interoperability WAN Technologies & Routing
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
Part.7.1 Copyright 2007 Koren & Krishna, Morgan-Kaufman FAULT TOLERANT SYSTEMS Part 7 - Coding.
All that remains is to connect the edges in the variable-setters to the appropriate clause-checkers in the way that we require. This is done by the convey.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING SONER DEDEOĞLU 10/12/
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Chinese postman problem
Lecture 3 Innerconnection Networks for Parallel Computers
Dr. Clincy1 Chapter 6 Delivery & Forwarding of IP Packets Lecture #4 Items you should understand by now – before routing Physical Addressing – with in.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Delivery, Forwarding, and Routing of IP Packets
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Shared versus Switched Media.
1 Interconnection Networks. 2 Interconnection Networks Interconnection Network (for SIMD/MIMD) can be used for internal connections among: Processors,
Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
Pancyclicity of M ö bius cubes with faulty nodes Xiaofan Yang, Graham M. Megson, David J. Evans Microprocessors and Microsystems 30 (2006) 165–172 指導老師.
Data Communications and Networks Chapter 1 - Classification of network topologies Data Communications and Network.
Super computers Parallel Processing
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
Chapter 9: Graphs.
Basic Communication Operations Carl Tropper Department of Computer Science.
18-WAN Technologies and Dynamic routing Dr. John P. Abraham Professor UTPA.
1 Switching and Forwarding Sections Connecting More Than Two Hosts Multi-access link: Ethernet, wireless –Single physical link, shared by multiple.
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.1.1 FAULT TOLERANT SYSTEMS Fault tolerant Measures.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.4.1 FAULT TOLERANT SYSTEMS Part 4 – Analysis Methods Chapter 2 – HW Fault Tolerance.
Overview Parallel Processing Pipelining
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Ch 13 WAN Technologies and Routing
Interconnection topologies
Chapter 6 Delivery & Forwarding of IP Packets
Butterfly Network A butterfly network consists of (K+1)2^k nodes divided into K+1 Rows, or Ranks. Let node (i,j) refer to the jth node in the ith Rank.
Lecture 14: Interconnection Networks
A Fault-Tolerant Routing Strategy for Fibonacci-Class Cubes
Graphs Chapter 11 Objectives Upon completion you will be able to:
Mesh-Connected Illiac Networks
Communication operations
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Graphs.
FAULT TOLERANT SYSTEMS ecs. umass
ECE 753: FAULT-TOLERANT COMPUTING
COMPUTER NETWORKS CS610 Lecture-16 Hammad Khalid Khan.
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
Presentation transcript:

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.1 FAULT TOLERANT SYSTEMS Part 12 - Networks – 2 Chapter 4: Network Fault Tolerance (Modified for ECE753 – Saluja)

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.2 Hypercube Networks  H n - An n-dimensional hypercube network - nodes  A 0-dimensional hypercube H 0 - a single node  H n constructed by connecting the corresponding nodes of two H n- 1 networks  The edges added to connect corresponding nodes are called dimension-(n-1) edges  Each node in an n-dimensional hypercube has n edges incident upon it Dimension-0 edge H 1

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.3 Hypercube - Examples Dimension-0 edges Dimension-1 edges Dimension-2 edges Dimension-3 edges H 4 H 1 H 2 H 3 H 3

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.4 Routing in Hypercubes  Specific numbering of nodes to simplify routing  Number expressed in binary - if nodes i and j are connected by a dimension-k edge, the names of i and j differ in only the k-th bit position  Example - nodes 0000 and 0010 differ in only the bit position - connected by a dimension-1 edge  Example - a packet needs to travel from node 14= to node 2= in an H 4 network  Possible routings -  1110  0110 (dimension 3)  0010 (dimension 2)  1110  1010 (dimension 2)  0010 (dimension 3) Dimension-0 edges Dimension-1 edges Dimension-2 edges Dimension-3 edges

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.5 Routing in Hypercubes - General  In general - distance between source and destination is the number of different bits in binary addresses  Going from X to Y can be accomplished by traveling once along each dimension in which they differ Define where  is the exclusive-or operator  Packet must traverse an edge in every dimension i for which

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.6 Adding Fault Tolerance to Hypercubes  Hn (for n  2) can tolerate link failures - multiple paths from any source to any destination  Node failures can disrupt the operation  Adding fault tolerance:  Adding one or more spare nodes  increasing number of communication ports of each original node from n to n+1  connecting the extra ports through additional links to spare nodes  Example - two spare nodes - each a spare for nodes of an H n-1 sub-cube  Spare nodes may require ports  Using crossbar switches with outputs connected to spare node reduces number of ports of spare node to n+1 - same as all other nodes

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.7 An H 4 Hypercube with Two Spare Nodes S S Each node is of degree 5

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.8 Different Method of Fault-Tolerance  Duplicating the processor in a few selected nodes  Each additional processor - spare also for any of the processors in the neighboring nodes  Example - nodes 0, 7, 8, 15 in H 4 - modified to duplex nodes  Every node now has a spare at a distance no larger than 1  Replacing a faulty processor by a spare results in an additional communication delay

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.9 Reliability of A Hypercube  Assumption: nodes and links all fail independently  Reliability of Hn is the product of  Reliability of nodes, and  Probability that every node can communicate with every other node  Exact evaluation of this probability difficult - every source-destination pair connected by multiple paths  Instead – we obtain a good lower bound on the reliability  Exploiting recursive nature of hypercube – we add probabilities of three mutually exclusive cases for which the network is connected  This is a lower bound – there may be other cases where Hn is connected

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part Three Mutually Exclusive Cases  Decompose Hn into two H n- 1 hypercubes, A and B, and the dimension-(n-1) links connecting them  Case 1: Both A and B are operational and at least one dimension-(n-1) link is functional  Case 2: One of A,B is operational and the other is not, and all dimension-(n-1) links are functional  Case 3: Only one of A,B is operational, exactly one dimension-(n-1) link is faulty and is connected in the nonoperational H n- 1 to a node that has at least one functional link to another node  Exercise: show that each of these cases results in a connected Hn and that they are mutually exclusive

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part Probabilities of the Three Cases  Notations: q c - probability of a node failure q l - probability of a link failure NR(Hn, q l, q c ) – reliability of hypercube Hn  t omitted for simplicity  Assumption: Nodes are perfectly reliable ( q c =0)  This assumption will be modified later  Case 1: Both A and B are operational and at least one dimension-(n-1) link is functional  Case 2: One of A,B is operational and the other is not and all dimension-(n-1) links are functional  Case 3: Only one of A,B is operational, exactly one dimension- (n-1) link is faulty and is connected in the nonoperational Hn-1 to a node that has at least one functional link to another node

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part Reliability of Hn  Initial case :  either hypercube of dimension 1: two nodes and one link  or, hypercube of dimension 2  The results will be different in both cases  If the nodes are not perfect (q c  0)