Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.

Similar presentations


Presentation on theme: "1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek."— Presentation transcript:

1 1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek Department of Computer Science and Engineering Humboldt University Berlin

2 2 Outline Introduction Problem Statement Methodologies for Web Service Reliability New Reliable Web Service Paradigm Road Map for Experiment Experimental Results and Discussion Conclusion

3 3 Introduction Service-oriented computing is becoming a reality. Service-oriented Architectures (SOA) are based on a simple model of roles. The problems of service dependability, security and timeliness are becoming critical. We propose experimental settings and offer a roadmap to dependable Web services.

4 4 Problem Statement Fault-tolerant techniques Replication Diversity Replication is one of the efficient ways for providing reliable systems by time or space redundancy. Increasing the availability of distributed systems Key components are re-executed or replicated Protect against hardware malfunctions or transient system faults. Another efficient technique is design diversity. By independently designing software systems or services with different programming teams, Resort in defending against permanent software design faults. We focus on the analysis of the replication techniques when applied to Web services. A generic Web service system with spatial as well as temporal replication is proposed and investigated.

5 5 Methodologies for reliable Web services -- Redundancy Spatial redundancy Static redundancy, all replicas are active at the same time and voting takes place to obtain a correct result. Dynamic redundancy engages one active replica at one time while others are kept in an active or in standby state. Temporal redundancy Redundant in time

6 6 Methodologies for reliable Web services -- Diversity Protect redundant systems against common-mode failures With different designs and implementations, common failure modes will probably cause different error effects. N-version programming, recovery blocks…

7 7 Failure Response Stages of Web Services Fault confinement Fault detection Diagnosis Fail-over Reconfiguration Recovery Restart Repair Reintegration

8 8 Fault Confinement Fault Detection FailoverDiagnosis Online Offline Reconfiguration Recovery Restart Repair Reintegration

9 9 Replication Manager Web service selection algorithm WatchDog UDDI Registry WSDL Web Service IIS Application Database Web Service IIS Application Database Web Service IIS Application Database Client Port Application Database 1.Create web services 2.Select primary web service (PWS) 3.Register 4. Look up 5. Get WSDL 6.Invoke web service 7.Keep check the availability of the PWS 8.If PWS failed, reselect the PWS. 9.Update the WSDL Proposed Paradigm

10 10 RM sends message to the Web Service Reselect a primary Web Service Do not get reply Map the new address to the WSDL System Fail Get reply All Service failed Work Flow of the Replication Manager

11 11 Road Map for Experiment Research Redundancy in time Redundancy in space Sequentially Parallel Majority voting using N modular redundancy Diversified version of different services

12 12 Experiments A series of experiments are designed and performed for evaluating the reliability of the Web service, single service without replication, single service with retry or reboot and, service with spatial replication. We will also perform retry or failover when the Web service is down.

13 13 Summary of the experiments NoneRetry/ Reboot FailoverBoth (hybrid) Single service, no retry 0-- Single service with retry --1 Single service with reboot --2 Spatial replication -- 34

14 14 Parameters of the Experiments Parameters Current setting/metric Request frequency1 req/min Polling frequency5 ms Number of replicas5 Client timeout period for retry10 s Failure rate λ# failures/hour Load (profile of the program)% or load function Reboot time10 min Failover time1 s

15 15 Experimental Results Experiments over 360 hour periods (43200 reqs) Number of failures Normal Number of failures Server busy Number of failures Server reboots periodically Exp 0492861306492 Exp 1221023272658 Exp 2256131603323 Exp 3132417111658 Exp 4108911481325 Retry 11.97% to 4.93% Reboot 11.97% to 6.44% Failover 11.97% to 3.56% Retry and Failover 11.97% to 2.59%

16 16 Number of failure when the server is is normal situation

17 17 Number of failure when the server is busy

18 18 Number of failure when the server reboots periodically

19 19 Reliability of the system over time

20 20 Reliability Model

21 Reliability Model Parameters IDDescriptionValue λnλn Network failure rate0.02 λ*Web service failure rate0.228 λ1λ1 Resource problem rate0.142 λ2λ2 Entry point failure rate0.150 μ*Web service repair rate0.286 μ1μ1 Resource problem repair rate0.979 μ2μ2 Entry point failure repair rate0.979 C1C1 Probability that the RM responds on time0.9 C2C2 Probability that the server reboots successfully0.9

22 22 Outcome (SHARPE) Failure Rate 0.228 0.114 0.057 Reliability of the proposed system

23 23 Conclusion Surveyed replication and design diversity techniques for reliable services. Proposed a hybrid approach to improving the availability of Web services. Carried out a series of experiments to evaluate the availability and reliability of the proposed Web service system. N-Version Programming may finally become commercially viable in service environment.


Download ppt "1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek."

Similar presentations


Ads by Google