UC Santa Cruz Impact of Failure on Interconnection Networks for Large Storage Systems Qin Xin, Ethan L. Miller, Thomas J. E. Schwarz*, Darrell D. E. Long.

Slides:



Advertisements
Similar presentations
EdgeNet2006 Summit1 Virtual LAN as A Network Control Mechanism Tzi-cker Chiueh Computer Science Department Stony Brook University.
Advertisements

Cross-layer Design in Wireless Mesh Networks Hu Wenjie Computer Network and Protocol Testing Laboratory, Dept. of Computer Science & Technology, Tsinghua.
Data Communications and Networking
Interconnection Networks: Flow Control and Microarchitecture.
Chapter 9 Introduction to MAN and WAN
© 2007 Cisco Systems, Inc. All rights reserved.ICND1 v1.0—1-1 Building a Simple Network Exploring the Functions of Networking.
Today’s topics Single processors and the Memory Hierarchy
The Connectivity and Fault-Tolerance of the Internet Topology
The TickerTAIP Parallel RAID Architecture P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Ceph: A Scalable, High-Performance Distributed File System Sage Weil Scott Brandt Ethan Miller Darrell Long Carlos Maltzahn University of California, Santa.
Disk Scrubbing in Large Archival Storage Systems Thomas Schwarz, S.J. 1,2 Qin Xin 1,3, Ethan Miller 1, Darrell Long 1, Andy Hospodor 1,2, Spencer Ng 3.
Availability in Global Peer-to-Peer Systems Qin (Chris) Xin, Ethan L. Miller Storage Systems Research Center University of California, Santa Cruz Thomas.
Dark and Panic Lab Computer Science, Rutgers University1 Evaluating the Impact of Communication Architecture on Performability of Cluster-Based Services.
T H E O H I O S T A T E U N I V E R S I T Y Computer Science and Engineering 1 Wenjun Gu, Xiaole Bai, Sriram Chellappan and Dong Xuan Presented by Wenjun.
Architecture and Real Time Systems Lab University of Massachusetts, Amherst An Application Driven Reliability Measures and Evaluation Tool for Fault Tolerant.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.
Towards a Connectivity-Based, Reliable Routing Framework Alec Woo Winter NEST Retreat 2004 UC Berkeley.
Interconnection Network Topologies
Architecture and Real Time Systems Lab University of Massachusetts, Amherst I Koren and C M Krishna Electrical and Computer Engineering University of Massachusetts.
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
1 © 2004, Cisco Systems, Inc. All rights reserved. Chapter 5 WANs and Routers/ Introduction to Routers.
“The Architecture of Massively Parallel Processor CP-PACS” Taisuke Boku, Hiroshi Nakamura, et al. University of Tsukuba, Japan by Emre Tapcı.
07/21/2005 Senmetrics1 Xin Liu Computer Science Department University of California, Davis Joint work with P. Mohapatra On the Deployment of Wireless Sensor.
Introducing Reliability and Load Balancing in Home Link of Mobile IPv6 based Networks Jahanzeb Faizan, Mohamed Khalil, and Hesham El-Rewini Parallel, Distributed,
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Security in High Performance Networks A Practical View Tony Cataldo 5/19/04.
Lec4: TCP/IP, Network management model, Agent architectures
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
Resisting Denial-of-Service Attacks Using Overlay Networks Ju Wang Advisor: Andrew A. Chien Department of Computer Science and Engineering, University.
Improving Capacity and Flexibility of Wireless Mesh Networks by Interface Switching Yunxia Feng, Minglu Li and Min-You Wu Presented by: Yunxia Feng Dept.
Mar 1, 2004 Multi-path Routing CSE 525 Course Presentation Dhanashri Kelkar Department of Computer Science and Engineering OGI School of Science and Engineering.
Network Survivability Against Region Failure Signal Processing, Communications and Computing (ICSPCC), 2011 IEEE International Conference on Ran Li, Xiaoliang.
Positioning in Ad-Hoc Networks - Directions and Results Jan Beutel Computer Engineering and Networks Lab Swiss Federal Institute of Technology Zurich August.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.
Using Virtual Links to Discover Network Topology Brett Holbert, Thomas F. La Porta Topology Discovery -Network topology may only be partially known -Want.
Distributed Monitoring of Mesh Networks Elizabeth Belding-Royer Mobility Management and Networking (MOMENT) Lab Dept. of Computer Science University of.
1 Computer Networking Dr. Mohammad Alhihi Communication and Electronic Engineering Department Philadelphia University Faculty of Engineering.
S Master’s thesis seminar 8th August 2006 QUALITY OF SERVICE AWARE ROUTING PROTOCOLS IN MOBILE AD HOC NETWORKS Thesis Author: Shan Gong Supervisor:Sven-Gustav.
Services provided by CERN’s IT Division Ethernet in controls applications European Organisation for Nuclear Research European Laboratory for Particle.
Birds Eye View of Interconnection Networks
1 Presented by Jing Sun Computer Science and Engineering Department University of Conneticut.
Super computers Parallel Processing
Evaluating Mobility Support in ZigBee Networks
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
Click to edit Master title style Literature Review Interconnection Architectures for Petabye-Scale High-Performance Storage Systems Andy D. Hospodor, Ethan.
Network Dynamics and Simulation Science Laboratory Structural Analysis of Electrical Networks Jiangzhuo Chen Joint work with Karla Atkins, V. S. Anil Kumar,
1 Roie Melamed, Technion AT&T Labs Araneola: A Scalable Reliable Multicast System for Dynamic Wide Area Environments Roie Melamed, Idit Keidar Technion.
Seminar On Rain Technology
© ExplorNet’s Centers for Quality Teaching and Learning 1 Classify various types of networks. Objective Course Weight 2%
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
Using Pattern-Models to Guide SSD Deployment for Big Data in HPC Systems Junjie Chen 1, Philip C. Roth 2, Yong Chen 1 1 Data-Intensive Scalable Computing.
Mesh-Network VoIP Call Capacity
HYBRID MICRO-GRID.
Exploring the Functions of Networking
Computer Network Topology
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
DATA COMMUNICATION Lecture-5.
Configuration of Cisco Routers in GNS3
Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed
                                                                                                            Network Decoupling for Secure Communications.
                                                                                                            Network Decoupling for Secure Communications.
Presented by Jason L.Y. Lin
Department of Computer Science University of York
Birds Eye View of Interconnection Networks
ECE 753: FAULT-TOLERANT COMPUTING
CAN (Campus Area Network)
Lec 2 Topology Computer Networks Al-Mustansiryah University
In-network computation
Presentation transcript:

UC Santa Cruz Impact of Failure on Interconnection Networks for Large Storage Systems Qin Xin, Ethan L. Miller, Thomas J. E. Schwarz*, Darrell D. E. Long Storage Systems Research Center University of California, Santa Cruz *Department of Computer Engineering Santa Clara University

MSST'05 2 4/13/05 Problems  Reliability is a real concern for large storage systems A large number of components Complex interconnections Human errors  Robust network interconnection is desired Failures of switches, routers, and network links are common Organization of nodes and links is crucial Scale makes things complicated: petabyte storage system -- 10,000s disks, 100s routers,100,000s links Robust network topology Tolerant multi-points of failures without noticeable performance degradation

MSST'05 3 4/13/05 Contributions  Examine various network topologies Butterfly networks Mesh structure Hypercube Tradeoffs among them: cost, robustness, performance.  Estimate the impacts of link and node failures by simulation I/O path connectivity Number of hops in varied degraded modes Failure impact to its neighborhood

MSST'05 4 4/13/05 Network Topology for Storage Systems  Butterfly networks Reasonable cost Poor robustness  Mesh Brick-structured, good robustness Relatively long path  Hypercube Superior robustness Complex, costly

MSST'05 5 4/13/05 A case study: Failure Scenarios

MSST'05 6 4/13/05 Simulated Results: Connectivity  I/O path connectivity Measured in “nines”: -log 10 (1-P), P: m/n, m: # of good requests, n: # of total requests. (3 nines = 99.9%) Poor connectively when the connection nodes are lost. Hypercube and mesh are more robust in connectivity than butterfly networks.

MSST'05 7 4/13/05 Simulation Results: Neighborhood mesh hypercube  Failure impact on network neighborhood Neighbors suffer in presence of failures. Hypercube outperforms butterfly & mesh structure in avg. I/O load. butterfly

MSST'05 8 4/13/05 Summary  A well-chosen topology can provide robust interconnection.  Neighbors around failures suffer much more than average.  Hypercube structure provides better interconnection in degraded modes than butterfly and mesh structure.

MSST'05 9 4/13/05 Questions?  For further information, please visit:  Thanks to SSRC faculty, students, affiliates, and alumni  Thanks to our sponsors Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Sandia National Laboratory Engenio Information Technologies, HP Lab, Hitachi Global Storage Technologies, IBM Research, Intel, LSI Logic, Microsoft Research, Network Appliance, and Veritas NSF, USENIX

MSST' /13/05 Big Picture: A Petabyte-Scale Storage System  Potential for data loss Thousands of disks Petabytes of data