© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice HP and Carrier Network System Availability.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

Clustering Technology For Scaleability Jim Gray Microsoft Research
Service Recovery & Availability Robert Dickerson June 2010.
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
I/O Chapter 8. Outline Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Copyright © 2006 Quest Software SQL 2005 Disk I/O Performance By Bryan Oliver SQL Server Domain Expert.
© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Partner managed print.
Large-Scale Distributed Systems Andrew Whitaker CSE451.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Proactive Care Kim.
Chapter 8 Fault Tolerance
Scrubbing Approaches for Kintex-7 FPGAs
HP 6125G and HP 6125G/XG Ethernet Blade Switches
Five Nines - To Dream the Impractical Dream? Presentation to the CSG Bruce Vincent.
Module 20 Troubleshooting Common SQL Server 2008 R2 Administrative Issues.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
What Great Research ?s Can RAMP Help Answer? What Are RAMP’s Grand Challenges ?
2. Introduction to Redundancy Techniques Redundancy Implies the use of hardware, software, information, or time beyond what is needed for normal system.
8. Fault Tolerance in Software
1 Mm3 Fault-Tolerance related to your projects 2 x 45 min. of Discussions.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
DISTRIBUTED COMPUTING
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Lecture 13 Fault Tolerance Networked vs. Distributed Operating Systems.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Module 9 Review Questions 1. The ability for a system to continue when a hardware failure occurs is A. Failure tolerance B. Hardware tolerance C. Fault.
Express5800/ ft series Fault Tolerant Servers “Why choose a server designed to recover from failure rather than a server designed not to fail in the first.
2Q2008 System z High Availability – Parallel Sysplex TGVL: System z Foundation 1 System z High Availability – Value of Parallel Sysplex IBM System z z10.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
1 소프트웨어공학 강좌 Chap 9. Distributed Systems Architectures - Architectural design for software that executes on more than one processor -
Thanks to Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction n What is an Operating System? n Mainframe Systems.
Transparency in Distributed Operating Systems Vijay Akkineni.
Protocol implementation Next-hop resolution Reliability and graceful restart.
High Availability for Information Security Managing The Seven R’s Rich Schiesser Sr. Technical Planner.
SafetyNet Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Fault-Tolerant Systems Design Part 1.
N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
ERCOT Project Update ERCOT Outage Evaluation Phase 2 (SCR745) TDTWG May 7, 2008.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2004 Daniel J. Sorin Duke University.
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
Distributed Systems Definition.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 1 Main Frame Computing Objectives Explain why data resides on mainframe.
70-412: Configuring Advanced Windows Server 2012 services
Introduction to Fault Tolerance By Sahithi Podila.
CSC 480 Software Engineering Lecture 17 Nov 4, 2002.
COP 5611 Operating Systems Spring 2010 Dan C. Marinescu Office: HEC 439 B Office hours: M-Wd 1:00-2:00 PM.
ERCOT Project Update ERCOT Outage Evaluation Phase 2 (SCR745) TDTWG November 5, 2008.
Standardized Fault Reporting in Electronic Commerce Software University of St. Thomas MBIF 705 – Foundations of Electronic Commerce Jeff D. Conrad December.
Commercial Fault Tolerance A Tale of Two Systems Umut Bultan.
Introduction to High Availability
Modularity Most useful abstractions an OS wants to offer can’t be directly realized by hardware Modularity is one technique the OS uses to provide better.
High Availability 24 hours a day, 7 days a week, 365 days a year…
Chapter 1: Introduction
Maximum Availability Architecture Enterprise Technology Centre.
Distributed Databases
CSC 480 Software Engineering
Fault Tolerance In Operating System
Clustering Technology For Fault Tolerance
COP 5611 Operating Systems Fall 2011
COP 5611 Operating Systems Spring 2010
COP 5611 Operating Systems Spring 2010
Co-designed Virtual Machines for Reliable Computer Systems
Seminar on Enterprise Software
Presentation transcript:

© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice HP and Carrier Network System Availability Lee Hines Hewlett Packard Software Division

613 February 2014 Availability, outages and the impacts of reliable networks

713 February 2014

Measuring availability Based on 24x7 operations, Planned and unplanned outages. Percent of availability* 99%99.9%99.99%99.999% % % Outage minutes/ year ~5,000~500~50~5~.5~.05 Outage to users 3.65 days8.8 hrs.~50 min.5 min.30 sec.3 sec.

Carrier network impacts from availability

HP NonStop server availability

HP NonStop availability and location based services

1213 February 2014 Increasing the availability – toward Seven, Eight & Nine 9s

The New NonStop Advanced Architecture DMR: Dual Modular Redundancy TMR: Triple Modular Redundancy (HW Availability: seven 9s) Loose Synchronization (lock-step) Each server runs on its own clock. Each can perform soft error corrections without causing a miscompare. Self-checked, shared-nothing, transparent take- over Fault Masking – HW Processor failures are masked and are not visible to all SW except for lowest level of OS. E.G. an uncorrectable memory error doesnt stop the logical processor, it simply stops one processor element that makes up the logical processor. Memory has one of the highest rates of failure. NSAA masks all memory failures. Repairs dont result in SW disruption either. Fault-tolerant parallel database Application server transaction processing monitors 1313 February 2014

1413 February 2014 Dual to Triple-Mode Redundancy Dual-Mode Redundancy = Five 9s Availability Triple-Mode Redundancy = Seven 9s Availability

Reliability, Availability, Scalability