Presentation is loading. Please wait.

Presentation is loading. Please wait.

Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Similar presentations


Presentation on theme: "Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo."— Presentation transcript:

1 Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei (1564 - 1642) Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information and Technology P.O. Box 513, 5600 MB Eindhoven The Netherlands w.m.p.v.d.aalst@tm.tue.nl

2 Outline Process Mining –overview –alpha algorithm –genetic mining ProM –Architecture –Convertors (e-mail, Staffware, InConcert, SAP, etc.) –Process mining plug-ins Alpha-algorithm Multi-phase mining Genetic mining –Analysis plug-ins –Conformance testing plug-in –LTL checker plug-in –Social network plug-in Conclusion

3 Process Mining

4 Motivation: Reversing the process Process mining can be used for: –Process discovery (What is the process?) –Delta analysis (Are we doing what was specified?) –Performance analysis (How can we improve?) process mining

5 Overview 1) basic performance metrics 2) process model3) organizational model4) social network 5) performance characteristics If …then … 6) auditing/security www.processmining.org

6 Let us focus on mining process models … 1) basic performance metrics 2) process model3) organizational model4) social network 5) performance characteristics If …then … 6) auditing/security... and a very simple approach: The alpha algorithm

7 Alpha algorithm α

8 Process log Minimal information in log: case id’s and task id’s. Additional information: event type, time, resources, and data. In this log there are three possible sequences: –ABCD –ACBD –EF case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

9 >, ,||,# relations Direct succession: x>y iff for some case x is directly followed by y. Causality: x  y iff x>y and not y>x. Parallel: x||y iff x>y and y>x Choice: x#y iff not x>y and not y>x. case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D A>B A>C B>C B>D C>B C>D E>F ABACBDCDEFABACBDCDEF B||C C||B

10 Basic idea (1) xyxy

11 Basic idea (2) x  y, x  z, and y||z

12 Basic idea (3) x  y, x  z, and y#z

13 Basic idea (4) x  z, y  z, and x||y

14 Basic idea (5) x  z, y  z, and x#y

15 It is not that simple: Basic alpha algorithm Let W be a workflow log over T.  (W) is defined as follows. 1.T W = { t  T     W t   }, 2.T I = { t  T     W t = first(  ) }, 3.T O = { t  T     W t = last(  ) }, 4.X W = { (A,B)  A  T W  B  T W   a  A  b  B a  W b   a1,a2  A a 1 # W a 2   b1,b2  B b 1 # W b 2 }, 5.Y W = { (A,B)  X   (A,B)  X A  A  B  B  (A,B) = (A,B) }, 6.P W = { p (A,B)  (A,B)  Y W }  {i W,o W }, 7.F W = { (a,p (A,B) )  (A,B)  Y W  a  A }  { (p (A,B),b)  (A,B)  Y W  b  B }  { (i W,t)  t  T I }  { (t,o W )  t  T O }, and  (W) = (P W,T W,F W ). The alpha algorithm has been proven to be correct for a large class of free-choice nets.

16 Example case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D  (W) W

17 DEMO Alpha algorithm 48 cases 16 performers

18 Challenges Refining existing algorithm for (control-flow/process perspective) –Hidden tasks –Duplicate tasks –Non-free-choice constructs –Loops –Detecting concurrency (implicit or explicit) –Mining and exploiting time –Dealing with noise –Dealing with incompleteness Mining other perspectives (data, resources, roles, …) Gathering data from heterogeneous sources Visualization of results Delta analysis

19 Genetic mining

20 Approach

21 Genetic mining: The two main questions How to represent an individual? (Petri net?) How to define the genetic operators? (e.g., crossover)

22 How to represent an individual? Problems with Petri nets: –Places do not exist in log –difficulties defining mutation and crossover –problems describing subtle rules without adding transitions

23 Representation of the goal process trueAAADDE^FE^FBvCvG →ABCDEFGH A01110000BvCvD B00000001H C00000001H D00001100E^F E00000010G F00000010G G00000001H H00000000true

24 A more compact representation ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E},{F}} E{{D}}{{G}} F{{D}}{{G}} G{{E},{F}}{{H}} H{{B,C,G}}{}

25 Any Petri net can be mapped onto a causal matrix: ACTIVITYINPUTOUTPUT A{...}{{C,D},...} B{...}{{C,D},...} C{{A,B},...}{...} D{{A,B},...}{...} but...

26 Mapping a causal matrix onto a Petri net? ACTIVITYINPUTOUTPUT A{{i 11,i 12,i 13 },{i 21,i 22,i 23 }}{{o 11,o 12,o 13 },{o 21,o 22,o 23 }}

27 Wiring based on input and output sets Using place fusion or silent transitions.

28 Example ACTIVITYINPUTOUTPUT A{}{{C,D}} B{}{{D}} C{{A}}{} D{{A,B}}{}

29 Wiring using silent transitions Always a solution?

30 Problem (the need to be lazy...) ACTIVITYINPUTOUTPUT A{}{{B},{C,D}} B{{A}}{{E,F}} C{{A}}{{E}} D{{A}}{{F}} E{{B},{C}}{{G}} F{{B},{D}}{{G}} G{{E},{F}}{}

31 However,... ACTIVITYINPUTOUTPUT A{}{{B},{C,D}} B{{A}}{{E,F}} C{{A}}{{E}} D{{A}}{{F}} E{{B},{C}}{{G}} F{{B},{D}}{{G}} G{{E},{F}}{}

32 Two approaches: CM2PN 1.A naive mapping using silent transitions. Always works Larger net Requires "lazy transition" semantics Implemented in ProM 2.A more sophisticated mapping based on "smart place fusion" Only for a subset of CMs Subclass can be characterized Not (yet) implemented

33 Example: Event log case idactivity idoriginatortimestamp case 1activity AJohn9-3-2004:15.01 case 2activity AJohn9-3-2004:15.12 case 3activity ASue9-3-2004:16.03 case 3activity DCarol9-3-2004:16.07 case 1activity BMike9-3-2004:18.25 case 1activity HJohn10-3-2004:9.23 case 2activity CMike10-3-2004:10.34 case 4activity ASue10-3-2004:10.35 case 2activity HJohn10-3-2004:12.34 case 3activity EPete10-3-2004:12.50 case 3activity FCarol11-3-2004:10.12 case 4activity DPete11-3-2004:10.14 case 3activity GSue11-3-2004:10.44 case 3activity HPete11-3-2004:11.03 case 4activity FSue11-3-2004:11.18 case 4activity EClare11-3-2004:12.22 case 4activity GMike11-3-2004:14.34 case 4activity HClare11-3-2004:14.38

34 Goal

35 Example: Starting point case idactivity id case 1activity A case 2activity A case 3activity A case 3activity D case 1activity B case 1activity H case 2activity C case 4activity A case 2activity H case 3activity E case 3activity F case 4activity D case 3activity G case 3activity H case 4activity F case 4activity E case 4activity G case 4activity H + 500 randomly generated initial individuals

36 Two individuals ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E}} E{{D}}{{G}} F{}{{G}} G{{E},{F}}{{H}} H{{C,B,G}}{} ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E,F}} E{{D}}{{G}} F{{D}}{{G}} G{{E},{F}}{{H}} H{{C},{B},{G}}{}

37 Crossover ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E, F}} E{{D}}{{G}} F{{D}}{{G}} G{{E},{F}}{{H}} H{{C,B,G}}{} ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E}} E{{D}}{{G}} F{}{{G}} G{{E},{F}}{{H}} H{{C},{B},{G}}{}

38 Resulting CM with fitness 1.0 trueAAADDE^FE^FBvCvG →ABCDEFGH A01110000BvCvD B00000001H C00000001H D00001100E^F E00000010G F00000010G G00000001H H00000000true

39 Mapping

40 ProM framework

41 ProM

42 Converter plug-in: EMailAnalyzer

43 XML format

44 ProM architecture

45 Mining plug-in: Alpha algorithm

46 Mining plug-in: Genetic Miner

47 Mining plug-in: Multi-phase mining

48 Step 1: Get instances

49 Step 2: Project

50 Step 3: Aggregate

51 Step 4: Map onto EPC

52 Step 5: Map onto Petri net (or other language)

53 Mining plug-in: Social network miner

54

55 Cliques

56 SN based on hand-over of work metric density of network is 0.225

57 SN based on working together (and ego network)

58 Analysis plug-in: LTL checker

59

60 Analysis plug-in: Conformance checker Do they agree?

61

62 Fitness is not enough

63 Screenshot (Also runs on Mac.)

64 Other analysis plug-ins

65 More demos?

66 Conclusion Process mining provides many interesting challenges for scientists, customers, users, managers, consultants, and tool developers. Involves multiple perspectives (process, data, resources, etc.) Get ProM-ed! You can contribute by applying ProM and developing plug-ins

67 More information http://www.workflowcourse.com http://www.workflowpatterns.com http://www.processmining.org


Download ppt "Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo."

Similar presentations


Ads by Google