Combining Process Mining and Distributed Tracing to Improve Root Cause Analysis Jochen Graeff (B.Sc.), 27.02.2017, Munich Advisor: Martin Kleehaus.

Combining Process Mining and Distributed Tracing to Improve Root Cause Analysis
Jochen Graeff (B.Sc.), , Munich Advisor: Martin Kleehaus

Agenda Motivation Introduction: Distributed Tracing and Process Mining
Problem Statement and Research Questions Combining Process Mining and Distributed Tracing Architectural Sketch Working Approach Timeline 4. Viewing Root Cause Analysis from an Enterprise Architecture Perspective © sebis

Motivation The following scenario …
A customer calls the service hotline of a car sharing because his just booked car won‘t open. The support agent records the incident (including an approximate time and the customer id). The incident is forwarded to an engineer in order to find out the root cause. In the meanwhile the support agents opens the car via the platform by force. In the next step, the root cause analysis, the engineer would check the systems log files for errors in the specified time frame for the services he thinks that might be affected. Since there a multiple possible services involved, it would probably take him or her a lot of time to investigate the issue and forward an exhaustive analysis report to the concerning colleague. - The investigation success is dependent on the logging infrastructure that is provided. - Maybe the company has a tool installed helping to filter the logs. Now imagine an multi service architecture instead of a monolithic system  multiple callstacks It‘s a complex, large scale distributed system Providing every service in the code with logging capabiities would be very costly and not be feasible. -> Logging in real life often bad maintained This is where distributed tracing comes into play. RabbitMQ is used to deliver traces to Zipkin. © sebis

Distributed Tracing ” Modern Internet services are often implemented as complex, large-scale distributed systems. These applications are constructed from collections of software modules that may be developed by different teams, perhaps in different programming languages, and could span many thousands of machines across multiple physical facilities…. Understanding system behavior in this context requires observing related activities across many different programs and machines. Dapper, A Large Scale Distributed Systems Tracing Infrastructure – Sigelman et al. (Google) 2010 Hard to find out which services are affected Low overhead Scalable Little code change Application Level Transperency: programmers should not be aware of the system © sebis

Distributed Tracing The path taken through a simple servicing system on behalf of user request X. The letter-labeled nodes represent pocesses in a distributed system. The causal and temporal relationship between fice spans in a specific trace also share a common trace id. Sigelman et al. (2010). Dapper, A Large Scale Distributed Systems Tracing Infrastructure. Google Research © sebis

Distributed Tracing © sebis

Is there a gap? Or put another way: what is already there?
Visualization Monitoring Instance Login Show Car Details Reserve Car Show FAQ Report Issue < User Layer Process Analyst Process Discovery Login Report Issue Reserve Car Business Layer Show Details Show FAQ Customer Journey S6 S9 S1 S4 S2 S3 S7 S5 S8 Distributed Tracing Systems Analyst S1 Span ID 1 Application Layer S2 Span ID 2 S3 Span ID 3 Distributed (Micro)-Services © sebis

What is Process Mining? “The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today's (information) systems.” IEEE CIS Task Force on Process Mining Create PO Start Production Ship Delivery Notify Customer Delete PO < Event Log TIMESTAMP ACTIVITY CASE ID :38: CreATE PO #1234 :46: Start production #5678 :47: NotIFY CUSTOMER #1234 :53: SHIP DELIVERY #9012 There are three Classes of Process Mining: Process Discovery Conformance Checking Extension © sebis

Problem Statement Monitoring techniques already exists for business and application layer but siloed No connection between process failure in the business layer and system failure in the application layer Therefore it is difficult to find correlations across the layers Root cause analysis is a very time consuming task Kleehaus, M., Uludag, Ö., Matthes, F. (2017) Towards a multi-layer IT Infrastructure monitoring approach based on Enterprise Architecture Information. CSE 17, Hannover, Germany. For example: is the reason for a high cancellation rate maybe a slow or even not functioning service? © sebis

Research Questions RQ1: How can a relationship between business activities and a distributed application architecture be established? RQ2: What data has to be extracted and how has it to be mapped to enable and store the relationship knowledge? RQ3: What is the state of the art of monitoring both the application and business layer? RQ4: How can a root cause analysis across the two different layers be partially automated? ? ? ? ? © sebis

Process Mining + Distributed Tracing
Using log data from Distributed Tracing to generate a business activities CASE ID ACTIVITY TIMESTAMP 12382 List available Cars 07/06/17 12:45:32 Reserve Car 07/06/17 12:46:12 19816 Book Car 07/06/17 17:45:32 39273 07/06/17 18:22:12 83947 07/06/17 18:24:14 Data required for Business Process Mining TRACE ID SESSION ID (through annotations) SPAN NAME REQUEST TIMESTAMP 12382 43212 Service 1 /car/list 07/06/17 12:45:32 65433 Service 4 07/06/17 12:46:12 19816 Service 3 /car/book 07/06/17 17:45:32 39273 74689 Service 2 07/06/17 18:22:12 83947 34686 07/06/17 18:24:14 Data available from Distibuted Tracing How can we close the gap? © sebis

Architectural Sketch Zipkin Server (Distributed Tracing)
POST /reserveCar GET /carDetails GET /userInfo S6 S9 S1 S4 S2 S3 S7 S5 S8 Sample Microservice Architecture with n services Process Mining Creation of the Event Log Zipkin, Twitter, opensourced Celonis Event Log Timestamp, Activity, ID :38: Create purchase order #1234 :46: Start production #5678 :47: Receive payment #1234 :53: SEND #9012 © sebis

Working Approach Which REST calls define an activity?
Which process steps define a process? Data manipulation: Create previously defined activities Define Activities & Processes Boundaries 1 Extract Data from Zipkin 2 Create Event Log 3 Visualize Process in Celonis 4 Connect Celonis with Zipkin 5 Enhance Visuali-zation with further data 5 Persist Data in RDBMS Link SessionID to some sort of Cases Table (e.g. purchases, dependent on process context) to display relevant information © sebis

Timeline Feb March April May June July August
Setup test microservice infrastructure Setup real* microservice infrastructure Event log generation Event log generation Build connector Build analysis in process mining tool Enhance analysis with further (case) data Literature research 15.08 Submission Thesis writing * high dependency on possible industry partners © sebis

Combining Process Mining and Distributed Tracing to Improve Root Cause Analysis Jochen Graeff (B.Sc.), 27.02.2017, Munich Advisor: Martin Kleehaus.

Similar presentations

Presentation on theme: "Combining Process Mining and Distributed Tracing to Improve Root Cause Analysis Jochen Graeff (B.Sc.), 27.02.2017, Munich Advisor: Martin Kleehaus."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Combining Process Mining and Distributed Tracing to Improve Root Cause Analysis Jochen Graeff (B.Sc.), 27.02.2017, Munich Advisor: Martin Kleehaus.

Similar presentations

Presentation on theme: "Combining Process Mining and Distributed Tracing to Improve Root Cause Analysis Jochen Graeff (B.Sc.), 27.02.2017, Munich Advisor: Martin Kleehaus."— Presentation transcript:

Similar presentations

About project

Feedback