CISC 879 - Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

Slides:



Advertisements
Similar presentations
Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
Advertisements

R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Acknowledgments Byron Bush, Scott S. Hilpert and Lee, JeongKyu
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
SHelp: Automatic Self-healing for Multiple Application Instances in a Virtual Machine Environment Gang Chen, Hai Jin, Deqing Zou, Weizhong Qiang, Gang.
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
Tools for Investigating Graphics System Performance
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Chapter 11 Operating Systems
Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.
BASE: Using Abstraction to Improve Fault Tolerance Rodrigo Rodrigues, Miguel Castro, and Barbara Liskov MIT Laboratory for Computer Science and Microsoft.
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
AFID: An Automated Fault Identification Tool Alex Edwards Sean Tucker Sébastien Worms Rahul Vaidya Brian Demsky.
Intrusion Detection System Marmagna Desai [ 520 Presentation]
Software Faults and Fault Injection Models --Raviteja Varanasi.
Address Space Layout Permutation
1. Topics to be discussed Introduction Objectives Testing Life Cycle Verification Vs Validation Testing Methodology Testing Levels 2.
University of Kansas Electrical Engineering Computer Science Jerry James and Douglas Niehaus Information and Telecommunication Technology Center Electrical.
MCTS Guide to Microsoft Windows Vista Chapter 11 Performance Tuning.
MCTS Guide to Microsoft Windows 7
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
Protection and the Kernel: Mode, Space, and Context.
© 2012 IBM Corporation Rational Insight | Back to Basis Series Chao Zhang Unit Testing.
Towards An Open Data Set for Trace-Oriented Monitoring Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Michael R. Lyu 1,2 1 National University.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 5: Threads Overview Multithreading Models Threading Issues Pthreads Solaris.
Operating System Concepts Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh University.
Winrunner Usage - Best Practices S.A.Christopher.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
WebVizOr: A Fault Detection Visualization Tool for Web Applications Goal: Illustrate and evaluate the uses of WebVizOr, a new tool to aid web application.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Automated Software Engineering with Concurrent Class Machines Radu Grosu SUNY at Stony Brook joint work with Y. Liu, S. Smolka, S.Stoller, J. Yan SUNY.
1 Qualitative Reasoning of Distributed Object Design Nima Kaveh & Wolfgang Emmerich Software Systems Engineering Dept. Computer Science University College.
DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
RISC and CISC. What is CISC? CISC is an acronym for Complex Instruction Set Computer and are chips that are easy to program and which make efficient use.
ADVANTAGES OF DATA BASE MANAGEMENT SYSTEM. TO BE DICUSSED... Advantages of Database Management System  Controlling Data RedundancyControlling Data Redundancy.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Processes and Virtual Memory
CSC 322 Operating Systems Concepts Lecture - 7: by Ahmed Mumtaz Mustehsan Special Thanks To: Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.
CSc 352 Debugging Tools Saumya Debray Dept. of Computer Science The University of Arizona, Tucson
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
GC 211:Data Structures Week 2: Algorithm Analysis Tools Slides are borrowed from Mr. Mohammad Alqahtani.
1 Chapter 5: Threads Overview Multithreading Models & Issues Read Chapter 5 pages
Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
Experience Report: System Log Analysis for Anomaly Detection
Problem: Internet diagnostics and forensics
Client/Server Databases and the Oracle 10g Relational Database
MCTS Guide to Microsoft Windows 7
ATTRACT TWD Symposium, Barcelona, Spain, 1st July 2016
UNIX System Overview.
Soft Error Detection for Iterative Applications Using Offline Training
Process Description and Control
CSc 352 Debugging Tools Saumya Debray Dept. of Computer Science
Operating Systems (CS 340 D)
Outline System architecture Current work Experiments Next Steps
Presentation transcript:

CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic Software Fault Diagnosis by Exploiting Application Signatures

CISC Machine Learning for Solving Systems Problems Motivation Application problem diagnosis in complex Enterprise environments large number of possible causes, most of the failures due to runtime interactions with the system environment Troubleshooting these problems requires extensive experience and time

CISC Machine Learning for Solving Systems Problems Overview Present a black box approach to diagnose several application faults Application signatures Approach to detect application faults Provide detail on tool design and implementation Evaluate effectiveness of the tool to correct fault behaviour of an application Case studies to support the ideology

CISC Machine Learning for Solving Systems Problems Application Behaviour Factors aiding in capturing application behaviour: System Calls Signals Environment variables Resource Limits Access Control Collecting and keeping history information help in finding the root cause of problem in quick time.

CISC Machine Learning for Solving Systems Problems Building Signatures... Choice of attributes – using “test of goodness test” using KS-test

CISC Machine Learning for Solving Systems Problems Signatures for system calls...

CISC Machine Learning for Solving Systems Problems handling multiple processes... Data is collected for each process separately Relations between systems calls will be correctly reflected after separating interleaving system calls Some specific attributes (eg. Signals, UID) are specific to a process For multithreaded applications – data collection and signatures are built separately for each thread Current approach does not handle user-level threads

CISC Machine Learning for Solving Systems Problems Tool Design System Architecture

CISC Machine Learning for Solving Systems Problems One...Application Tracer... Tracer tool force executes target application e.g. ‘tracer application_program’ Low overheads is crucial Uses p-trace interface for building signatures for system calls Some runtime behaviours (environment variables, resource limit, user id, etc) are not relevant to system calls

CISC Machine Learning for Solving Systems Problems Two...Signature Bank...

CISC Machine Learning for Solving Systems Problems Three...Fault Diagnosis... Classifier tool provides root cause for deviation from normal behaviour: Access the signature bank for normal traces Compare with faulty trace obtained Determine the root cause for this fault Provide information to user with diagnosis

CISC Machine Learning for Solving Systems Problems Case Studies

CISC Machine Learning for Solving Systems Problems Testing with Apache... For testing the tool with Apache, WebStone 2.5 is used WebStone 2.5 is free benchmarking tool for web servers Signature bank was built from performing operations ten times each to generate corresponding traces Example: Faulty execution of write system call Unable to write into log file Root Cause: Error Number EFBIG indicating that file is too large

CISC Machine Learning for Solving Systems Problems Testing with Apache...

CISC Machine Learning for Solving Systems Problems Observation 1 Comparison showing change in size of trace over 45 minute period 6.3 MB space contains recording of nearly 11 million system call invocations

CISC Machine Learning for Solving Systems Problems Observation 2 Comparison of change in size of trace file and signature bank based on the number of traces run Signature bank grows slow as redundant data are merged

CISC Machine Learning for Solving Systems Problems From other tests... CVS Average slowdown – 29.6% Collected 26 traces ranging from 0.1 MB to 1.6 MB Recorded signature bank is 6.5 MB consisting of about 1.8 million system calls PostgreSQL Average slowdown – 15.7% Collected traces ranging from 0.6 MB to 2.1 MB Recorded signature bank is 3.2 MB

CISC Machine Learning for Solving Systems Problems Limiting false positives First cause is related to KS-test Second cause relates to the fact that Signature bank cannot cover all normal variations of the attributes Aggregating more traces would complete the bank and reduce false positives gradually

CISC Machine Learning for Solving Systems Problems Performance measure Majority is due to information collection and trace file updating when a system call happens Overheads that occur: Switching from kernel to tracer and back both at system call entry and exit Retrieving system call number, return value and related attributes Looking up user stack to get its content Improvement obtained by modifying ptrace code with addition of primitives PTRACE_SETBATCHSIZE and PTRACE_READBUFFER

CISC Machine Learning for Solving Systems Problems Improvement...

CISC Machine Learning for Solving Systems Problems Limitations Labelling of application execution trace as faulty Manual indication required Conservative approach in capturing amount of information needed for trace More analysis required to identify minimum required set of data that will provide higher accuracy in detecting problems Results are limited from exploring few case studies

CISC Machine Learning for Solving Systems Problems THANK YOU

CISC Machine Learning for Solving Systems Problems Slide Title First Bullet