Presentation is loading. Please wait.

Presentation is loading. Please wait.

R. Barret, P. Maglio, E. Kandogan, J. Bailey, Usable Autonomic Computing Systems: the Administrators' Perspective, ICAC 2004Usable Autonomic Computing.

Similar presentations


Presentation on theme: "R. Barret, P. Maglio, E. Kandogan, J. Bailey, Usable Autonomic Computing Systems: the Administrators' Perspective, ICAC 2004Usable Autonomic Computing."— Presentation transcript:

1 R. Barret, P. Maglio, E. Kandogan, J. Bailey, Usable Autonomic Computing Systems: the Administrators' Perspective, ICAC 2004Usable Autonomic Computing Systems: the Administrators' Perspective Brown and J. Hellerstein, Reducing the Cost of IT Operations - Is Automation Always the Answer?, HOTOS 2005.Reducing the Cost of IT Operations - Is Automation Always the Answer? Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, and Yi-Min Wang, Automatic Misconfiguration Troubleshooting with PeerPressure, OSDI ’04Automatic Misconfiguration Troubleshooting with PeerPressure

2 R. Barret, P. Maglio, E. Kandogan, J. Bailey, Usable Autonomic Computing Systems: the Administrators' Perspective, ICAC 2004,Usable Autonomic Computing Systems: the Administrators' Perspective

3 Motivation –the problem of administrating highly complex systems –managing complexity through automation from low-level configuration settings to high-level business- oriented policies –the risk of making management harder systems change more rapidly administrator controls affecting more systems So, administrator controls will be both more powerful and more dangerous Goal: inform the design of AC Methodology: ethnographic field study!

4 What system administrators do? –rehearsal and planning –maintaining situation awareness –managing multitasking, interruptions and diversions

5 Tools command-line based console –command-line interfaces (CLIs) –multitasking, history, scripting –fast and reliable probing of disparate parts of system –easy to customize! standalone graphical applications –graphical user interfaces (GUIs) –good for unfamiliar tasks and novice users –depending on graphics support, insufficient support for multitasking web-based management tools –don’t depend on graphics support –can be integrated to provide an organized suite

6 Phases –rehearsal and planning –maintaining situation awareness –managing multitasking, interruptions and diversions Analysis and Guidelines for AC

7 Rehearsing and Planning –necessary to critical systems because of both the chance for human error and the danger of unforeseen consequences –AC may increase both of these dangers as the scale and degree of coupling within complex systems increases, new patterns of failure may develop through a series of several smaller failures as autonomic managers automatically reconfigure subsystems, the results on the overall system may be difficult to predict –Guidelines should be easy to build test systems should be designed to be able to quickly undo changes

8 Situation Awareness Administrators deal with dynamic and complex processes at many different levels of abstraction They need to be aware of systems that are not only complex, but that also change frequently Each system had its own management interface and so gaining overall situation awareness was very difficult –Guidelines Automation has made operators more passive Automated systems typically hide details from operators –Consequently, operator workload decreases during normal operating conditions, but increases during critical conditions Must provide facilities for rapidly gaining deeper situation awareness when problems arise

9 Multitasking, Interruptions, Diversions –conventional systems Working with many components, but each component works relatively independently –Guidelines each level affects a component’s operation, it will be difficult to design a general workflow for debugging Therefore AC interfaces should allow multiple simultaneous views of system components and aggregates to support interaction at multiple levels

10 Brown and J. Hellerstein, Reducing the Cost of IT Operations - Is Automation Always the Answer?, HOTOS 2005.Reducing the Cost of IT Operations - Is Automation Always the Answer?

11 Is Automation Always the Answer? No! Why?

12 Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, and Yi-Min Wang, Automatic Misconfiguration Troubleshooting with PeerPressure, OSDI ’04Automatic Misconfiguration Troubleshooting with PeerPressure

13 Misconfiguration Diagnosis Technical support contributes 17% of TCO [Tolly2000] Much of application malfunctioning comes from misconfigurations Why? –Shared configuration data (e.g., Registry) and uncoordinated access and update from different applications How about maintaining the golden config state? –Very hard [Larsson2001] Complex software components and compositions Third party applications …

14 Outline Motivation Goals Design Prototype Evaluation results Future work Concluding remarks

15 Goals Effectiveness –Small set of sick configuration candidates that contain the root-cause entries Automation –No second party involvement –No need to remember or identify what is healthy

16 Intuition behind PeerPressure Assumption –Applications function correctly on most machines -- malfunctioning is anomaly Succumb to the peer pressure

17 An Example SuspectsMineP1’sP2’sP3’sP4’s e101111 e2on off e3574010034 Is R1 sick? Most likely Is R2 sick? Probably not Is R3 sick? Maybe not – R3 looks like an operational state We use Bayesian statistics to estimate the sick probability of a suspect -- our ranking metric

18 Registry Entry Suspects 0HKLM\System\Setup\... OnHKLM\Software\Msft\... nullHKCU\%\Software\... DataEntry PeerPressure Search & Fetch Statistical Analyzer Canonicalizer Peer-to-Peer Troubleshooting Community Database Troubleshooting Result 0.2HKLM\System\Setup\... 0.6HKLM\Software\Msft\... 0.003HKCU\%\Software\... Prob.Entry App Tracer Run the faulty app System Overview

19 Evaluation Data Set 87 live Windows XP registry snapshots (in the database) –Half of these snapshots are from three diverse organizations within Microsoft: Operations and Technology Group (OTG) Helpdesk in Colorado, MSR-Asia, and MSR-Redmond. –The other half are from machines across Microsoft that were reported to have potential Registry problems 20 real-world troubleshooting cases with known root-causes

20 Response Time # of suspects: 8 to 26,308 with a median: 1171 45 seconds in average for SQL server hosted on a 2.4GHz CPU workstation with 1 GB RAM Sequential database queries dominate

21 Troubleshooting Effectiveness Metric: root cause ranking Results: –Rank = 1 for 12 cases –Rank = 2 for 3 cases –Rank = 3, 9, 12, 16 for 4 cases, respectively –cannot solve one case

22 Concluding Remarks Automatic misconfiguration diagnosis is possible –Use statistics from the mass to automate manual identification of the healthy –Initial results promising


Download ppt "R. Barret, P. Maglio, E. Kandogan, J. Bailey, Usable Autonomic Computing Systems: the Administrators' Perspective, ICAC 2004Usable Autonomic Computing."

Similar presentations


Ads by Google