R. Barret, P. Maglio, E. Kandogan, J. Bailey, Usable Autonomic Computing Systems: the Administrators' Perspective, ICAC 2004Usable Autonomic Computing.

Slides:



Advertisements
Similar presentations
Presented by Nikita Shah 5th IT ( )
Advertisements

Automatic Misconfiguration Troubleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research Presenter:
MIS 2000 Class 20 System Development Process Updated 2014.
© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.
Software Engineering 1. Software development – the grand view 2. Requirements engineering.
Tom Sheridan IT Director Gas Technology Institute (GTI)
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Metrics Project and Process Metrics. Why do we measure? Assessing project status Allows us to track risks Before they go critical Adjust workflow See.
R R R CSE870: Advanced Software Engineering (Cheng): Intro to Software Engineering1 Advanced Software Engineering Dr. Cheng Overview of Software Engineering.
RIT Software Engineering
SE 450 Software Processes & Product Metrics 1 Defect Removal.
Interpret Application Specifications
Microsoft Baseline Security Analyzer INLS 187 Security Software Presentation by Hinár György Polczer
Maintaining and Updating Windows Server 2008
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 14: Troubleshooting Windows Server 2003 Networks.
MDOP 2010: Diagnostic and Recovery Toolset (DaRT) Speaker Fabrizio Grossi
Passage Three Introduction to Microsoft SQL Server 2000.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Windows Server 2008 Chapter 11 Last Update
Presented by INTRUSION DETECTION SYSYTEM. CONTENT Basically this presentation contains, What is TripWire? How does TripWire work? Where is TripWire used?
1 Automatic Misconfiguration Disagnosis with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, and Yi-Min Wang Microsoft Research OSDI 2004,
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
© 2009 Mathew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Maintaining a Microsoft SQL Server 2008 Database SQLServer-Training.com.
11 SECURITY TEMPLATES AND PLANNING Chapter 7. Chapter 7: SECURITY TEMPLATES AND PLANNING2 OVERVIEW  Understand the uses of security templates  Explain.
Microsoft ® Official Course Module 10 Optimizing and Maintaining Windows ® 8 Client Computers.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
Chapter 6 : Software Metrics
User Manager Pro Suite Taking Control of Your Systems Joe Vachon Sales Engineer November 8, 2007.
Module 7: Fundamentals of Administering Windows Server 2008.
Winrunner Usage - Best Practices S.A.Christopher.
Microsoft Application Virtualization 5.0: Introduction Mohnish Chaturvedi & Ian Bartlett Premier Field Engineer WCL312.
Usable Autonomic Computing Systems: the Administrator’s Perspective R. Barret, P. Maglio, E. Kandogan, J. Bailey Proc. of ICAC 2004.
Software Project Management Lecture # 3. Outline Chapter 22- “Metrics for Process & Projects”  Measurement  Measures  Metrics  Software Metrics Process.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Systems Analysis and Design in a Changing World, Fourth Edition
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
ABone Architecture and Operation ABCd — ABone Control Daemon Server for remote EE management On-demand EE initiation and termination Automatic EE restart.
Self-Managing Cost Models Shivnath Babu Stanford University.
Power at Your Fingertips –Overlooked Gems in Oracle EM John Sheaffer Principal Sales Consultant – Oracle Corporation.
Cmpe 589 Spring 2006 Lecture 2. Software Engineering Definition –A strategy for producing high quality software.
Microsoft Management Seminar Series SMS 2003 Change Management.
PwC New Technologies New Risks. PricewaterhouseCoopers Technology and Security Evolution Mainframe Technology –Single host –Limited Trusted users Security.
INTRUSION DETECTION SYSYTEM. CONTENT Basically this presentation contains, What is TripWire? How does TripWire work? Where is TripWire used? Tripwire.
Software Maintenance Speaker: Jerry Gao Ph.D. San Jose State University URL: Sept., 2001.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
CSE 303 – Software Design and Architecture
1 The EDIT System, Overview European Commission – Eurostat.
Chapter 5:User Interface Design Concepts Of UI Interface Model Internal an External Design Evaluation Interaction Information Display Software.
Proposal for an Open Source Flash Failure Analysis Platform (FLAP) By Michael Tomer, Cory Shirts, SzeHsiang Harper, Jake Johns
MTA EXAM Software Testing Fundamentals : OBJECTIVE 6 Automate Software Testing.
Configuring Debugging as Search: Finding the Needle in the Haystack Andrew Whitaker, Richard S. Cox and Steven D. Gribble. University of Washington Presented.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Automating Configuration Troubleshooting with Dynamic Information Flow Analysis Mona Attariyan Jason Flinn University of Michigan.
Ellis Paul Technical Solution Specialist – System Center Microsoft UK Operations Manager Overview.
Chapter 1- Introduction Lecture 1. Topics covered  Professional software development  What is meant by software engineering.  Software engineering.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
IPariksha Automating Examination System. Brief iPariksha is a complete online examination system designed and developed for accelerating the manual examination.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
ITMT 1371 – Window 7 Configuration 1 ITMT Windows 7 Configuration Chapter 8 – Managing and Monitoring Windows 7 Performance.
Maintaining and Updating Windows Server 2008 Lesson 8.
Advanced Software Engineering Dr. Cheng
Hybrid Management and Security
Intelligent Systems Development
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Oracle Database Administration
Computer Aided Software Engineering (CASE)
CherryPick: Adaptively Unearthing the Best
Presentation transcript:

R. Barret, P. Maglio, E. Kandogan, J. Bailey, Usable Autonomic Computing Systems: the Administrators' Perspective, ICAC 2004Usable Autonomic Computing Systems: the Administrators' Perspective Brown and J. Hellerstein, Reducing the Cost of IT Operations - Is Automation Always the Answer?, HOTOS 2005.Reducing the Cost of IT Operations - Is Automation Always the Answer? Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, and Yi-Min Wang, Automatic Misconfiguration Troubleshooting with PeerPressure, OSDI ’04Automatic Misconfiguration Troubleshooting with PeerPressure

R. Barret, P. Maglio, E. Kandogan, J. Bailey, Usable Autonomic Computing Systems: the Administrators' Perspective, ICAC 2004,Usable Autonomic Computing Systems: the Administrators' Perspective

Motivation –the problem of administrating highly complex systems –managing complexity through automation from low-level configuration settings to high-level business- oriented policies –the risk of making management harder systems change more rapidly administrator controls affecting more systems So, administrator controls will be both more powerful and more dangerous Goal: inform the design of AC Methodology: ethnographic field study!

What system administrators do? –rehearsal and planning –maintaining situation awareness –managing multitasking, interruptions and diversions

Tools command-line based console –command-line interfaces (CLIs) –multitasking, history, scripting –fast and reliable probing of disparate parts of system –easy to customize! standalone graphical applications –graphical user interfaces (GUIs) –good for unfamiliar tasks and novice users –depending on graphics support, insufficient support for multitasking web-based management tools –don’t depend on graphics support –can be integrated to provide an organized suite

Phases –rehearsal and planning –maintaining situation awareness –managing multitasking, interruptions and diversions Analysis and Guidelines for AC

Rehearsing and Planning –necessary to critical systems because of both the chance for human error and the danger of unforeseen consequences –AC may increase both of these dangers as the scale and degree of coupling within complex systems increases, new patterns of failure may develop through a series of several smaller failures as autonomic managers automatically reconfigure subsystems, the results on the overall system may be difficult to predict –Guidelines should be easy to build test systems should be designed to be able to quickly undo changes

Situation Awareness Administrators deal with dynamic and complex processes at many different levels of abstraction They need to be aware of systems that are not only complex, but that also change frequently Each system had its own management interface and so gaining overall situation awareness was very difficult –Guidelines Automation has made operators more passive Automated systems typically hide details from operators –Consequently, operator workload decreases during normal operating conditions, but increases during critical conditions Must provide facilities for rapidly gaining deeper situation awareness when problems arise

Multitasking, Interruptions, Diversions –conventional systems Working with many components, but each component works relatively independently –Guidelines each level affects a component’s operation, it will be difficult to design a general workflow for debugging Therefore AC interfaces should allow multiple simultaneous views of system components and aggregates to support interaction at multiple levels

Brown and J. Hellerstein, Reducing the Cost of IT Operations - Is Automation Always the Answer?, HOTOS 2005.Reducing the Cost of IT Operations - Is Automation Always the Answer?

Is Automation Always the Answer? No! Why?

Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, and Yi-Min Wang, Automatic Misconfiguration Troubleshooting with PeerPressure, OSDI ’04Automatic Misconfiguration Troubleshooting with PeerPressure

Misconfiguration Diagnosis Technical support contributes 17% of TCO [Tolly2000] Much of application malfunctioning comes from misconfigurations Why? –Shared configuration data (e.g., Registry) and uncoordinated access and update from different applications How about maintaining the golden config state? –Very hard [Larsson2001] Complex software components and compositions Third party applications …

Outline Motivation Goals Design Prototype Evaluation results Future work Concluding remarks

Goals Effectiveness –Small set of sick configuration candidates that contain the root-cause entries Automation –No second party involvement –No need to remember or identify what is healthy

Intuition behind PeerPressure Assumption –Applications function correctly on most machines -- malfunctioning is anomaly Succumb to the peer pressure

An Example SuspectsMineP1’sP2’sP3’sP4’s e e2on off e Is R1 sick? Most likely Is R2 sick? Probably not Is R3 sick? Maybe not – R3 looks like an operational state We use Bayesian statistics to estimate the sick probability of a suspect -- our ranking metric

Registry Entry Suspects 0HKLM\System\Setup\... OnHKLM\Software\Msft\... nullHKCU\%\Software\... DataEntry PeerPressure Search & Fetch Statistical Analyzer Canonicalizer Peer-to-Peer Troubleshooting Community Database Troubleshooting Result 0.2HKLM\System\Setup\ HKLM\Software\Msft\ HKCU\%\Software\... Prob.Entry App Tracer Run the faulty app System Overview

Evaluation Data Set 87 live Windows XP registry snapshots (in the database) –Half of these snapshots are from three diverse organizations within Microsoft: Operations and Technology Group (OTG) Helpdesk in Colorado, MSR-Asia, and MSR-Redmond. –The other half are from machines across Microsoft that were reported to have potential Registry problems 20 real-world troubleshooting cases with known root-causes

Response Time # of suspects: 8 to 26,308 with a median: seconds in average for SQL server hosted on a 2.4GHz CPU workstation with 1 GB RAM Sequential database queries dominate

Troubleshooting Effectiveness Metric: root cause ranking Results: –Rank = 1 for 12 cases –Rank = 2 for 3 cases –Rank = 3, 9, 12, 16 for 4 cases, respectively –cannot solve one case

Concluding Remarks Automatic misconfiguration diagnosis is possible –Use statistics from the mass to automate manual identification of the healthy –Initial results promising