1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.

Slides:



Advertisements
Similar presentations
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Advertisements

11. Practical fault-tolerant system design Reliable System Design 2005 by: Amir M. Rahmani.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Reliability on Web Services Presented by Pat Chan 17/10/2005.
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
Making Services Fault Tolerant
1 Building Reliable Web Services: Methodology, Composition, Modeling and Experiment Pat. P. W. Chan Department of Computer Science and Engineering The.
Distributed components
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Software Reliability Engineering: A Roadmap
An Authentication Service Based on Trust and Clustering in Wireless Ad Hoc Networks: Description and Security Evaluation Edith C.H. Ngai and Michael R.
DS -V - FDT - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business)
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud Alexander G. Connor Panos K. Chrysanthis Alexandros Labrinidis Advanced Data Management.
Design, Implementation, and Experimentation on Mobile Agent Security for Electronic Commerce Applications Anthony H. W. Chan, Caris K. M. Wong, T. Y. Wong,
8. Fault Tolerance in Software
Reliability on Web Services Pat Chan 31 Oct 2006.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Design of SCS Architecture, Control and Fault Handling.
1 Building Reliable Web Services: Methodology, Composition, Modeling and Experiment Pat. P. W. Chan Supervised by Michael R. Lyu Department of Computer.
A Progressive Fault Tolerant Mechanism in Mobile Agent Systems Michael R. Lyu and Tsz Yeung Wong July 27, 2003 SCI Conference Computer Science Department.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
3 Cloud Computing.
Team Members Lora zalmover Roni Brodsky Academic Advisor Professional Advisors Dr. Natalya Vanetik Prof. Shlomi Dolev Dr. Guy Tel-Zur.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
FMEA-technique of Web Services Analysis and Dependability Ensuring Anatoliy Gorbenko Vyacheslav Kharchenko Olga Tarasyuk National Aerospace University.
Managing Service Metadata as Context The 2005 Istanbul International Computational Science & Engineering Conference (ICCSE2005) Mehmet S. Aktas
Secure Systems Research Group - FAU 1 A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering.
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu
 Chapter 13 – Dependability Engineering 1 Chapter 12 Dependability and Security Specification 1.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Fault-Tolerant Systems Design Part 1.
ISADS'03 Message Logging and Recovery in Wireless CORBA Using Access Bridge Michael R. Lyu The Chinese Univ. of Hong Kong
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Survey of Adding Fault Tolerance to Service Oriented Architecture Ingrid Buckley 03/26/09.
Building Reliable SOA from the Unreliable Web Services Ben, Zibin ZHENG Department of Computer Science & Engineering The Chinese University of Hong Kong.
An Adaptive Intrusion-Tolerant Architecture Alfonso Valdes, Tomas Uribe, Magnus Almgren, Steven Cheung, Yves Deswarte, Bruno Dutertre, Josh Levy, Hassen.
Yuhui Chen; Romanovsky, A.; IT Professional Volume 10, Issue 3, May-June 2008 Page(s): Digital Object Identifier /MITP Improving.
1 Reliable Web Services by Fault Tolerant Techniques: Methodology, Experiment, Modeling and Evaluation Term Presentation Presented by Pat Chan 3 May 2006.
CprE 458/558: Real-Time Systems
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
April 28, 2003 Early Fault Detection and Failure Prediction in Large Software Systems Felix Salfner and Miroslaw Malek Department of Computer Science Humboldt.
CS 505: Thu D. Nguyen Rutgers University, Spring CS 505: Computer Structures Fault Tolerance Thu D. Nguyen Spring 2005 Computer Science Rutgers.
FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.
1 Taxonomy and Trends Dan Siewiorek Carnegie Mellon University June 2012.
WS-DREAM: A Distributed Reliability Assessment Mechanism for Web Services Zibin Zheng, Michael R. Lyu Department of Computer Science & Engineering The.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 1: Characterization of Distributed & Mobile Systems Dr. Michael R.
Presentation-2 Group-A1 Professor:Mohamed Khalil Anita Kanuganti Hemanth Rao.
1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University
Tolerating Communication and Processor Failures in Distributed Real-Time Systems Hamoudi Kalla, Alain Girault and Yves Sorel Grenoble, November 13, 2003.
1 Developing Aerospace Applications with a Reliable Web Services Paradigm Pat. P. W. Chan and Michael R. Lyu Department of Computer Science and Engineering.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Investigating QoS of Web Services by Distributed Evaluation Zibin Zheng Feb. 8, 2010 Department of Computer Science & Engineering.
Presented by Deepak Varghese Reg No: Introduction Application S/W for server load balancing Many client requests make server congestion Distribute.
18/05/2006 Fault Tolerant Computing Based on Diversity by Seda Demirağ
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Week#3 Software Quality Engineering.
Software Quality Assurance
Fault Tolerance In Operating System
Outline Announcements Fault Tolerance.
Fault Tolerance Distributed Web-based Systems
Reliable Web Services: Methodology, Experiment and Modeling International Conference on Web Services (ICWS 2007) Pat. P. W. Chan, Michael R. Lyu Department.
Network management system
Seminar on Enterprise Software
Presentation transcript:

1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek Department of Computer Science and Engineering Humboldt University Berlin

2 Outline Introduction Problem Statement Methodologies for Web Service Reliability New Reliable Web Service Paradigm Road Map for Experiment Experimental Results and Discussion Conclusion

3 Introduction Service-oriented computing is becoming a reality. Service-oriented Architectures (SOA) are based on a simple model of roles. The problems of service dependability, security and timeliness are becoming critical. We propose experimental settings and offer a roadmap to dependable Web services.

4 Problem Statement Fault-tolerant techniques Replication Diversity Replication is one of the efficient ways for providing reliable systems by time or space redundancy. Increasing the availability of distributed systems Key components are re-executed or replicated Protect against hardware malfunctions or transient system faults. Another efficient technique is design diversity. By independently designing software systems or services with different programming teams, Resort in defending against permanent software design faults. We focus on the analysis of the replication techniques when applied to Web services. A generic Web service system with spatial as well as temporal replication is proposed and investigated.

5 Methodologies for reliable Web services -- Redundancy Spatial redundancy Static redundancy, all replicas are active at the same time and voting takes place to obtain a correct result. Dynamic redundancy engages one active replica at one time while others are kept in an active or in standby state. Temporal redundancy Redundant in time

6 Methodologies for reliable Web services -- Diversity Protect redundant systems against common-mode failures With different designs and implementations, common failure modes will probably cause different error effects. N-version programming, recovery blocks…

7 Failure Response Stages of Web Services Fault confinement Fault detection Diagnosis Fail-over Reconfiguration Recovery Restart Repair Reintegration

8 Fault Confinement Fault Detection FailoverDiagnosis Online Offline Reconfiguration Recovery Restart Repair Reintegration

9 Replication Manager Web service selection algorithm WatchDog UDDI Registry WSDL Web Service IIS Application Database Web Service IIS Application Database Web Service IIS Application Database Client Port Application Database 1.Create web services 2.Select primary web service (PWS) 3.Register 4. Look up 5. Get WSDL 6.Invoke web service 7.Keep check the availability of the PWS 8.If PWS failed, reselect the PWS. 9.Update the WSDL Proposed Paradigm

10 RM sends message to the Web Service Reselect a primary Web Service Do not get reply Map the new address to the WSDL System Fail Get reply All Service failed Work Flow of the Replication Manager

11 Road Map for Experiment Research Redundancy in time Redundancy in space Sequentially Parallel Majority voting using N modular redundancy Diversified version of different services

12 Experiments A series of experiments are designed and performed for evaluating the reliability of the Web service, single service without replication, single service with retry or reboot and, service with spatial replication. We will also perform retry or failover when the Web service is down.

13 Summary of the experiments NoneRetry/ Reboot FailoverBoth (hybrid) Single service, no retry 0-- Single service with retry --1 Single service with reboot --2 Spatial replication -- 34

14 Parameters of the Experiments Parameters Current setting/metric Request frequency1 req/min Polling frequency5 ms Number of replicas5 Client timeout period for retry10 s Failure rate λ# failures/hour Load (profile of the program)% or load function Reboot time10 min Failover time1 s

15 Experimental Results Experiments over 360 hour periods (43200 reqs) Number of failures Normal Number of failures Server busy Number of failures Server reboots periodically Exp Exp Exp Exp Exp Retry 11.97% to 4.93% Reboot 11.97% to 6.44% Failover 11.97% to 3.56% Retry and Failover 11.97% to 2.59%

16 Number of failure when the server is is normal situation

17 Number of failure when the server is busy

18 Number of failure when the server reboots periodically

19 Reliability of the system over time

20 Reliability Model

Reliability Model Parameters IDDescriptionValue λnλn Network failure rate0.02 λ*Web service failure rate0.228 λ1λ1 Resource problem rate0.142 λ2λ2 Entry point failure rate0.150 μ*Web service repair rate0.286 μ1μ1 Resource problem repair rate0.979 μ2μ2 Entry point failure repair rate0.979 C1C1 Probability that the RM responds on time0.9 C2C2 Probability that the server reboots successfully0.9

22 Outcome (SHARPE) Failure Rate Reliability of the proposed system

23 Conclusion Surveyed replication and design diversity techniques for reliable services. Proposed a hybrid approach to improving the availability of Web services. Carried out a series of experiments to evaluate the availability and reliability of the proposed Web service system. N-Version Programming may finally become commercially viable in service environment.