Maikel Leemans Wil M.P. van der Aalst. Process Mining in Software Systems 2 System under Study (SUS) Functional perspective Focus: User requests Functional.

Slides:



Advertisements
Similar presentations
Database System Concepts and Architecture
Advertisements

Identification of Distributed Features in SOA Anis Yousefi, PhD Candidate Department of Computing and Software McMaster University July 30,
Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.
® IBM Software Group © 2006 IBM Corporation Rational Software France Object-Oriented Analysis and Design with UML2 and Rational Software Modeler 04. Other.
Aligning Event Logs and Process Models for Multi- perspective Conformance Checking: An Approach Based on ILP Massimiliano de Leoni Wil M. P. van der Aalst.
Data warehouse example
8.
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Modeling State-Dependent Objects Using Colored Petri Nets
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.
Data Analysis (and User Interaction) GEOG 463 5/7/04.
The Software Product Life Cycle. Views of the Software Product Life Cycle  Management  Software engineering  Engineering design  Architectural design.
Chapter 10: Architectural Design
Web-Enabling the Warehouse Chapter 16. Benefits of Web-Enabling a Data Warehouse Better-informed decision making Lower costs of deployment and management.
Strategies to relate the program and problem domains using code instrumentation Mario Marcelo Berón University of Minho Pedro Rangel Henriques University.
A university for the world real R © 2009, Chapter 23 Epilogue Wil van der Aalst Michael Adams Arthur ter Hofstede Nick Russell.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Chapter 10 Architectural Design
 Zhichun Li  The Robust and Secure Systems group at NEC Research Labs  Northwestern University  Tsinghua University 2.
4/2/03I-1 © 2001 T. Horton CS 494 Object-Oriented Analysis & Design Software Architecture and Design Readings: Ambler, Chap. 7 (Sections to start.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
Introduction to Formal Methods Based on Jeannette M. Wing. A Specifier's Introduction to Formal Methods. IEEE Computer, 23(9):8-24, September,
Introduction To System Analysis and Design
Programming in Java Unit 3. Learning outcome:  LO2:Be able to design Java solutions  LO3:Be able to implement Java solutions Assessment criteria: 
©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Information System Development Courses Figure: ISD Course Structure.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
Modeling Shari L. Pfleeger and Joanne M. Atlee, Software Engineering: Theory and Practice, 4 th edition, Prentice Hall, Hans Van Vliet, Software.
Chapter 12: Design Phase n 12.1 Design and Abstraction n 12.2 Action-Oriented Design n 12.3 Data Flow Analysis n Data Flow Analysis Example n
Performance evaluation of component-based software systems Seminar of Component Engineering course Rofideh hadighi 7 Jan 2010.
TAL7011 – Lecture 4 UML for Architecture Modeling.
1 Qualitative Reasoning of Distributed Object Design Nima Kaveh & Wolfgang Emmerich Software Systems Engineering Dept. Computer Science University College.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Discovering object interaction. Use case realisation The USE CASE diagram presents an outside view of the system. The functionality of the use case is.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Decision Mining in Prom A. Rozinat and W.M.P. van der Aalst Joosung, Ko.
How I spend my money Software architecture course Mohan, Maxim.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Discovering occurrences of user-defined patterns in historical data representing collaborative activities in virtual user environment Jozef Wagner František.
Ch- 8. Class Diagrams Class diagrams are the most common diagram found in modeling object- oriented systems. Class diagrams are important not only for.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
Course: COMS-E6125 Professor: Gail E. Kaiser Student: Shanghao Li (sl2967)
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
21/1/ Analysis - Model of real-world situation - What ? System Design - Overall architecture (sub-systems) Object Design - Refinement of Design.
1 Architectural Blueprints—The “4+1” View Model of Software Architecture (
Modeling Shari L. Pfleeger and Joanne M. Atlee, Software Engineering: Theory and Practice, 4 th edition, Prentice Hall, Hans Van Vliet, Software.
5. 2Object-Oriented Analysis and Design with the Unified Process Objectives  Describe the activities of the requirements discipline  Describe the difference.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
A Software Energy Analysis Method using Executable UML for Smartphones Kenji Hisazumi System LSI Research Center Kyushu University.
Aspect-oriented Code Generation Approaches Abid Mehmood & Dayang N. A. Jawawi Department of Software Engineering Faculty of Computer Science and Information.
Profiling based unstructured process logs
Dynamic Modeling of Banking System Case Study - II
Object-Oriented Analysis
Unified Modeling Language
Introduction to Servlets
Neo4j for Process Mining
Presentation transcript:

Maikel Leemans Wil M.P. van der Aalst

Process Mining in Software Systems 2 System under Study (SUS) Functional perspective Focus: User requests Functional perspective Focus: User requests

Process Mining in Software Systems System under Study (SUS) Instrument Aspects Instrumented SUS 3

Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data 4 System under Study (SUS)

Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log 5 System under Study (SUS) Business Transactions Traces: User requests Business Transactions Traces: User requests

Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining 6 System under Study (SUS)

Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining 7 System under Study (SUS)

Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining InstrumentationCollect Data Discover Business Transactions Related Work and Assumptions Evaluation 8 System under Study (SUS)

Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining InstrumentationCollect Data Discover Business Transactions Evaluation System under Study (SUS) Related Work and Assumptions 9

Related work – Current trends (Majority of papers) 10 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Literature Survey Comparison of various reverse engineering techniques Investigating current trend, plus advantages and disadvantages

Related work – Current trends (Majority of papers) 11 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Java [1,4,5,6,8,10]

Related work – Current trends (Majority of papers) 12 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Java [1,4,5,6,8,10] Instrumentation (e.g., AspectJ) [1,2,3,6,8]

Related work – Current trends (Majority of papers) 13 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Java [1,4,5,6,8,10] Instrumentation (e.g., AspectJ) [1,2,3,6,8] No support [1,2,3,4,5,8,9]

Related work – Current trends (Majority of papers) 14 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Java [1,4,5,6,8,10] Instrumentation (e.g., AspectJ) [1,2,3,6,8] No support [1,2,3,4,5,8,9] UML Sequence Diagram [1,2,3,4,5,6,7,10]

Related work – Dynamic Analysis Techniques 15 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity [6] Control-flow Sequence Diagram UML Sequence Diagrams

Related work – Dynamic Analysis Techniques 16 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity [7] Network Sequence Diagram UML Sequence Diagrams

Related work – Dynamic Analysis Techniques 17 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity UML Sequence Diagrams Performance? Right amount of detail? [7] Network Sequence Diagram

Related work – Dynamic Analysis Techniques 18 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity [9] Performance (based on # calls) Performance statistics

Related work – Dynamic Analysis Techniques 19 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Performance statistics Context? Bottlenecks? [9] Performance (based on # calls)

Assumptions – compared to related work 20 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Java [1,4,5,6,8,10] Instrumentation (e.g., AspectJ) [1,2,3,6,8] No support [1,2,3,4,5,8,9] UML Sequence Diagram [1,2,3,4,5,6,7,10]

Assumptions – compared to related work 21 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Instrumentation (e.g., AspectJ) [1,2,3,6,8] No support [1,2,3,4,5,8,9] UML Sequence Diagram [1,2,3,4,5,6,7,10] Any instrumentable language

Assumptions – compared to related work 22 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity UML Sequence Diagram [1,2,3,4,5,6,7,10] Any instrumentable language Instrumentation (Joinpoint-Pointcuts) No support [1,2,3,4,5,8,9]

Instrumentation (Joinpoint-Pointcuts) Assumptions – compared to related work 23 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity UML Sequence Diagram [1,2,3,4,5,6,7,10] Any instrumentable language Supported (Communication-based)

Instrumentation (Joinpoint-Pointcuts) Assumptions – compared to related work 24 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Any instrumentable language Supported (Communication-based) Event Logs (and process models)

Instrumentation (Joinpoint-Pointcuts) Assumptions – other considerations 25 Any instrumentable language Supported (Communication-based) Event Logs (and process models) Assume global clock (e.g., NTP) Communication: Across process Across threads Other considerations Focus on User requests

Assumptions – other considerations 26 Event Log Process Models Assume global clock (e.g., NTP) Communication: Across process Across threads Other considerations Focus on User requests

Assume global clock (e.g., NTP) Communication: Across process Across threads Other considerations Focus on User requests Assumptions – other considerations 27 Event Log Process Models Context Performance

Process Mining in Software Systems Stream of Event Data Event Log Structure of SUS Process Mining Collect Data Discover Business Transactions Related Work and Assumptions Evaluation Instrument Aspects Instrumented SUS Instrumentation 28 System under Study (SUS)

Distributed System Instrumenting the System under Study 29 AB Communication channel System under Study (SUS)

Distributed System Instrumenting the System under Study 30 AB Communication channel System under Study (SUS) Tracing Instrumented SUS

Distributed System Instrumenting the System under Study 31 AB Communication channel System under Study (SUS) Instrumented SUS Tracing Instrument Aspects

function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Aspects: The Joinpoint-Pointcut Model 32

function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Aspects: The Joinpoint-Pointcut Model 33 Joinpoint

function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Aspects: The Joinpoint-Pointcut Model 34 Pointcut: function *(int); Pointcut

function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Aspects: The Joinpoint-Pointcut Model 35 Pointcut: function *(int); Insert Before: logEvent(“start”); Insert After: logEvent(“complete”); Joinpoint Pointcut Aspect

function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Aspects: The Joinpoint-Pointcut Model Pointcut: function *(int); Insert Before: logEvent(“start”); Insert After: logEvent(“complete”); 36 Joinpoint Pointcut Aspect

Aspects: The Joinpoint-Pointcut Model 37 Common pointcut patterns: Specific interfaces, methods (Low-level) communication function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Pointcut: function *(int); Insert Before: logEvent(“start”); Insert After: logEvent(“complete”); Joinpoint Pointcut Aspect

Process Mining in Software Systems Instrument Aspects Instrumented SUS Event Log Structure of SUS Process Mining InstrumentationDiscover Business Transactions Related Work and Assumptions Evaluation System under Study (SUS) Stream of Event Data Collect Data 38

Distributed System Collecting Data from Software Systems 39 Tracing A B Communication channel

Distributed System Collecting Data from Software Systems 40 AB Communication channel Logging server Event stream Event Log produces Tracing

Event Log Distributed System Scenario Triggering: Real-Life Behavior 41 Tracing Logging server Event stream produces AB User User request Communication channel

Generated Stream of Event Data Node A Node B func. F func. G func. M func. N 42

Generated Stream of Event Data Node A Node B func. F func. G func. M func. N Event Data: Timestamp ms (global clock) Joinpoint name of point in code Lifecycle start, complete (call, return) Node id (≈ location) Node Instance id (≈ execution thread) Resource communication data Event Data: Timestamp ms (global clock) Joinpoint name of point in code Lifecycle start, complete (call, return) Node id (≈ location) Node Instance id (≈ execution thread) Resource communication data 43

Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data InstrumentationCollect Data Related Work and Assumptions Evaluation System under Study (SUS) Event Log Structure of SUS Process Mining Discover Business Transactions 44

Collection of events from multiple streams Node A Node B func. F func. G func. M func. N 45 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource

Create event intervals (start, end) Node A Node B func. F func. G func. M func. N 46 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource

Node B Cluster events (single node) Node A func. F func. G func. M func. N Same node: Interval containment Same node: Interval containment 47 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource

Cluster events (across nodes) Node A Node B func. F func. G func. M func. N res. R res. R’ Related resources indicate communication channel 48 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource

Node A Node B Cluster events (across nodes) func. F func. G func. M func. N res. R res. R’ Related resources acquired at the same time (intersection) 49 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource

Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource Node A Node B Event traces func. F func. G func. M func. N res. R res. R’ 50 A single trace

Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource Node A Node B Business Transactions func. F func. G func. M func. N res. R res. R’ Maximal trace User request 51 A single trace

Node A Node B Concurrency – Multiple node instances func. F func. G func. M func. N res. R res. R’ 52 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource

Resulting Event Log Node A Node B func. F func. G func. M func. N res. R res. R’ 53 Event Log Structure of SUS Process Mining Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource

Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining InstrumentationCollect Data Discover Business Transactions Related Work and Assumptions System under Study (SUS) Evaluation 54

Case study – Pet catalog 55 Pet Catalog Webserver (Glassfish) Database (MySQL) User Browser Webpage request TCP/IP

Case study – Pet catalog Analysis questions 1) High-level end-to-end process? 2) Main bottlenecks? 56

Case study – Approach 57 Instrumentation decisions Process Mining Analysis questions 1) High-level end-to-end process? 2) Main bottlenecks? 1)High-level end-to-end process? 2)Main bottlenecks?

Process Mining Case study – Specifying input pointcuts 58 Instrumentation decisions 1)High-level end-to-end process? 2)Main bottlenecks? Defined pointcuts targeting  Network communication  Database interface  Webserver interface (i.e., servlets)

Process Mining Case study – Specifying input pointcuts 59 Instrumentation decisions 1)High-level end-to-end process? 2)Main bottlenecks? Defined pointcuts targeting  Network communication  Database interface  Webserver interface (i.e., servlets) Pointcut predicates HasInterface javax.persistence.EntityManager Communication java.net.Socket javax.servlet.* javax.faces.*

Case study – Process Discovery 60 Instrumentation decisions 1)High-level end-to-end process? 2)Main bottlenecks? Main question What sequence of operations are needed to complete a user request? Event LogProcess Mining

Case study – Inductive visual miner (process tree) 61

Case study – Inductive visual miner (process tree) 62

Case study – Inductive visual miner (process tree) 63

Case study – Conversion between formal models 64 from process tree to petri net

Case study – Pet catalog 65 Instrumentation decisions 1)High-level end-to-end process? 2)Main bottlenecks? Process Mining

Case study – Performance analysis in model context 66 Align Event Log and Petri Net Analyze throughput and sync. time

Case study – Performance analysis in model context 67 Align Event Log and Petri Net Analyze throughput and sync. time

Case study – Performance analysis in model context 68 “Top-level” function, represents total time Align Event Log and Petri Net Analyze throughput and sync. time

Case study – Performance analysis in model context 69 “Top-level” function, represents total time Bottleneck in comm. with database (read) Align Event Log and Petri Net Analyze throughput and sync. time

Case study – Conclusion 70 Instrumentation decisions 1)High-level end-to-end process? 2)Main bottlenecks? High-level process Main bottleneck comm. with database (read) Process Mining

Case study – Hadoop MapReduce 71 Hadoop YARN Resource Manager User Client RPC Node Manager Container Image source: ibm.com

Case study – Hadoop MapReduce 72 Align Event Log and Petri Net Analyze throughput and sync. time The “Map” in “MapReduce” The “Reduce” in “MapReduce”

Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining 73 System under Study (SUS)

Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining 1) Instrumentation2) Collect Data 3) Discover Business Transactions 74 System under Study (SUS)

Future work Evaluation Investigate more complex, distributed systems (Hadoop) Investigate instrument impact Process Discovery Leverage “nested” lifecycle information Make location data more explicit in models 75 Event LogProcess Mining

Maikel Leemans Wil M.P. van der Aalst 76