Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Process Mining Seminars in Software and Services for the Information Society Rome, 2014, May the 5th Francesco Leotta Dipartimento di Ingegneria Informatica,

Similar presentations


Presentation on theme: "1 Process Mining Seminars in Software and Services for the Information Society Rome, 2014, May the 5th Francesco Leotta Dipartimento di Ingegneria Informatica,"— Presentation transcript:

1 1 Process Mining Seminars in Software and Services for the Information Society Rome, 2014, May the 5th Francesco Leotta Dipartimento di Ingegneria Informatica, Automatica e Gestionale – DIAG “Sapienza” Università di Roma Thanks for part of the presented material to Ana Karla Alves de Medeiros Eindhoven University of Technology Department of Information Systems

2 Process Mining 1.Register at hospital 2.Talk to doctor 3.Take blood test 4.Take X-ray 5.Get results 6.Talk to doctor 7.Go home 1.Register at hospital 2.Talk to doctor 3.Undergo surgery 4.Take medicines 5.Take blood sample 6.Talk to doctor 7.Go home 1.Register at hospital 2.Talk to doctor 3.Undergo surgery 4.Take medicines 5.Take blood sample 6.Talk to doctor 7.Go home 1.Register at hospital 2.Talk to doctor 3.Undergo surgery 4.Take medicines 5.Take blood sample 6.Talk to doctor 7.Go home 2 EventLogs 1.Register at hospital 2.Talk to doctor 3.Undergo surgery 4.Take medicines 5.Take blood sample 6.Talk to doctor 7.Go home 1.Register at hospital 2.Talk to doctor 3.Undergo surgery 4.Take medicines 5.Take blood sample 6.Talk to doctor 7.Go home

3 Process Mining 3 Performance Analysis Process Model Organizational Model Social Network EventLog MiningTechniques Auditing/Security MinedModels

4 Event Logs are Everywhere! 4 aaaaaa aaaaaa Machines, Municipalities, Airports, Internet, Hospitals, etc.

5 Tools ProM 6 ProMimport Free tools! 5

6 Process Mining Short Recap Types of Process Mining Algorithms Common Constructs Input Format α-algorithm Heuristics Miner Genetic Miner Fuzzy Miner 6

7 Types of Algorithms 7

8 8 Process Model Organizational Model Social Network

9 Types of Algorithms 9 Auditing/Security Compliance Process Model

10 Types of Algorithms 10 Bottlenecks/ Business Rules Process Model Performance Analysis

11 Process Mining 11 EventLog Discovery Techniques: Control-Flow Mining MinedModel 1.Start 2.Get Ready 3.Travel by Train 4.Beta Event Starts 5.Visit Brewery 6.Have Dinner 7.Go Home 8.Travel by Train 1.Start 2.Get Ready 3.Travel by Train 4.Beta Event Starts 5.Give a Talk 6.Visit Brewery 7.Have Dinner 8.Go Home 9.Travel by Train 1.Start 2.Get Ready 3.Travel by Car 4.Beta Event Starts 5.Give a Talk 6.Visit Brewery 7.Have Dinner 8.Go Home 9.Pay Parking 10.Travel by Car 1.Start 2.Get Ready 3.Travel by Car 4.Conference Starts 5.Join Reception 6.Have Dinner 7.Go Home 8.Pay Parking 9.Travel by Car 10.End Start Get Ready Travel by Train Train Travel by Car Car Conference Starts Give a Talk Join Reception Have Dinner Go Home Travel by Train Train Travel by Car Car PayParking End

12 Process Mining Short Recap Types of Process Mining Algorithms Common Constructs Input Format α-algorithm Heuristics Miner Genetic Miner Fuzzy Miner 12

13 Common Constructs Sequence Splits Joins Loops Non-Free Choice Invisible Tasks Duplicate Tasks 13 PayParking GetReady Travel by Train Train Travel by Car Car Defense Starts Give a Talk Ask Question Defense Ends Go Home Travel by Train Train Travel by Car Car Have Drinks

14 Common Constructs Sequence Splits Joins Loops Non-Free Choice Invisible Tasks Duplicate Tasks 14 PayParking GetReady Travel by Train Train Travel by Car Car Defense Starts Give a Talk Ask Question Defense Ends Go Home Travel by Train Train Travel by Car Car Have Drinks + noise in logs!

15 Process Mining Short Recap Types of Process Mining Algorithms Common Constructs Input Format α-algorithm Heuristics Miner Genetic Miner Fuzzy Miner 15

16 Event Log: Mining XML (MXML) 16 The notion of which tasks belong to a same instance is crucial for applying process mining techniques!

17 Event Log: Mining XML (MXML) 17

18 Event Log: Mining XML (MXML) 18 Compulsory fields!

19 Process Mining Short Recap Types of Process Mining Algorithms Common Constructs Input Format α-algorithm Heuristics Miner Genetic Miner Fuzzy Miner 19

20 α-algorithm 1.Read a log 2.Get the set of tasks 3.Infer the ordering relations 4.Build the net based on inferred relations 5.Output the net 20 Core Step!

21 Direct succession: x>y iff for some case x is directly followed by y. Causality: x  y iff x>y and not y>x. Parallel: x||y iff x>y and y>x Unrelated: x#y iff not x>y and not y>x. 21 case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B... A>BA>CB>CB>DC>BC>DE>F ABABACACBDBDCDCDEFEFABABACACBDBDCDCDEFEF B||CC||BABCDACBDEF α-algorithm - Ordering Relations >, ,||,#

22 α-algorithm - Formalization 22 Let L be a workflow log over T.  (L) is defined as follows. 1.T L = { t  T     L t   }, 2.T I = { t  T     L t = first(  ) }, 3.T O = { t  T     L t = last(  ) }, 4.X L = { (A,B)  A  T L  B  T L   a  A  b  B a  L b   a1,a2  A a 1 # L a 2   b1,b2  B b 1 # L b 2 }, 5.Y L = { (A,B)  X   (A,B)  X A  A  B  B  (A,B) = (A,B) }, 6.P L = { p (A,B)  (A,B)  Y L }  {i L,o L }, 7.F L = { (a,p (A,B) )  (A,B)  Y L  a  A }  { (p (A,B),b)  (A,B)  Y L  b  B }  { (i L,t)  t  T I }  { (t,o L )  t  T O }, and  (L) = (P L,T L,F L ).

23 Example 23

24 α-algorithm - Example 24

25 α-algorithm - Insight 25ABCDACBDEF ABABACACBDBDCDCDEFEFABABACACBDBDCDCDEFEF B||CC||B

26 α-algorithm – Log properties + target nets If log is complete with respect to relation >, it can be used to mine Structured Workflow Nets (SWF-net) without short loops (see next slide) SWF-nets have no implicit places. An implicit place is a place whose removal does not change the behavior of the net. The following two constructs cannot be used: 26 Non-free choice (choice and synchronization together) Synchronization of OR-join places

27 α-algorithm – Why No short loops? 27 One-length Two-length B>B and not B>B implies BB (impossible!) A>B and B>A implies A||B and B||A instead of AB and BA

28 α-algorithm – Common Constructs No invisible tasks, non-free-choice or duplicate tasks No noisy logs 28 Why no duplicate tasks? Why not invible tasks? Why noise- free logs?

29 Process Mining Short Recap Types of Process Mining Algorithms Common Constructs Input Format α-algorithm Heuristics Miner Genetic Algorithm Fuzzy Miner 29

30 Heuristics Miner 1.Read a log 2.Get the set of tasks 3.Infer the ordering relations based on their frequencies 4.Build the net based on inferred relations 5.Output the net 30

31 Heuristics Miner Insight: The more frequently a task A directly follows another task B, and the less frequently the opposite occurs, the higher the probability that A causally follows B! 31

32 Heuristic miner – Common Constructs No non-free-choice or duplicate tasks Robust to (sometimes) invisible tasks and noisy logs 32 Why no non-free- choice? Why no guarantee whole net will be correct?

33 Process Mining Short Recap Types of Process Mining Algorithms Common Constructs Input Format α-algorithm Heuristics Miner Genetic Miner Fuzzy Miner 33

34 Genetic Process Mining (GPM) Genetic Algorithms + Process Mining Genetic Algorithms Search technique that mimics the process of evolution in biological systems Advantages Tackle all common structural constructs Robust to noise Disadvantages Computational Time 34

35 Genetic Process Mining (GPM) 35 Algorithm: Internal Representation Fitness Measure Genetic Operators

36 GPM – Fitness Measure Guides the search! 36 Start Get Ready Travel by Train Train Travel by Car Car Conference Starts Give a Talk Visit Brewery Have Dinner Go Home Travel by Train Train Travel by Car Car PayParking End

37 GPM – Fitness Measure 37 Start Get Ready Travel by Car Car Conference Starts Give a Talk Visit Brewery Have Dinner Go Home PayParking End Travel by Train Train Punish for the amount of enabled tasks during the parsing! Overgeneral solution

38 GPM – Fitness Measure 38 Start Get Ready Travel by Car Conference Starts Give a Talk Visit Brewery Have Dinner Go Home Travel by Train Travel by Car Pay Parking End Travel by Train Punish for the amount of duplicate tasks with common input/output tasks! Overspecific solution

39 Process Mining Short Recap Types of Process Mining Algorithms Common Constructs Input Format α-algorithm Heuristics Miner Genetic Miner Fuzzy Miner 39

40 Fuzzy Miner - Motivation Mine less structured processes! 40

41 Fuzzy Miner - Motivation 41

42 Fuzzy Miner 42

43 Fuzzy Miner 43 No “Ask Question” or “Give Talk”! All details! Abstracting even more from details!

44 44 Artful Process Mining Seminars in Software and Services for the Information Society Rome, 2014, May the 5th Francesco Leotta Dipartimento di Ingegneria Informatica, Automatica e Gestionale – DIAG “Sapienza” Università di Roma Thanks for part of the presented material to Claudio di Ciccio WU Vienna Institute for Information Business

45 A different context (1) Artful processes [HillEtAl06]HillEtAl06 informal processes typically carried out by those people whose work is mental rather than physical (managers, professors, researchers, engineers, etc.) “knowledge workers” [ACTIVE09]ACTIVE09 Knowledge workers create artful processes “on the fly” Though artful processes are frequently repeated, they are not exactly reproducible, even by their originators, nor can they be easily shared. Artful processes and knowledge workers P. 45

46 A different context (2) In collaborative contexts, knowledge workers share their information and outcomes with other knowledge workers E.g., a software development mgr. Typically, by means of several conversations conversations are actual traces of running processes that knowledge workers adhere to conversations P. 46

47 A different context (3) From the collection of messages, you can extract the processes that lay behind Related conversations are traces of their runs Valuable advantages for users Automated discovery of formal representations with no effort for knowledge workers Tidy organization for naïve best practices kept only in mind Opportunity to share and compare the knowledge on methodologies Automated discovery of bottlenecks, delays, structural defects from the analysis of previous runs conversations are a kind of semi-structured text this approach is not tailored to the electronic mail it can be extended to the analysis of other semi-structured texts Processes from conversations P. 47

48 A different context (4) Personal information management (PIM) how to organize one’s own activities, contacts, etc. through the usage of software Information warfare in supporting anti-crime intelligence agencies Enterprise engineering for knowledge-heavy industries, where preserving documents making up product data is not enough eHealth for the automatic discovery of medical treatment procedures on top of patient health records Some areas of applicability P. 48

49 MailOfMine MailOfMine is the approach and the implementation of a collection of techniques, the aim of which is to is to automatically build, on top of a collection of messages, a set of workflow models that represent the artful processes laying behind the knowledge workers’ activities. [DiCiccioEtAl11]DiCiccioEtAl11 [DiCiccioMecella12]DiCiccioMecella12 [DiCiccioMecella/TR12]DiCiccioMecella/TR12 P. 49 What is MailOfMine?

50 On the visualization of processes P. 50 The imperative model Represents the whole process at once The most used notation is based on a subclass of Petri Nets (namely, the Workflow Nets)

51 On the visualization of processes Rather than using a procedural language for expressing the allowed sequence of activities, it is based on the description of workflows through the usage of constraints the idea is that every task can be performed, except the ones which do not respect such constraints this technique fits with processes that are highly flexible and subject to changes, such as artful processes P. 51 The declarative model If A is performed, B must be perfomed, no matter before or afterwards (responded existence) Whenever B is performed, C must be performed afterwards and B can not be repeated until C is done (alternate response) The notation here is based on [AalstEtAl06, MaggiEtAl11] (DecSerFlow, Declare)AalstEtAl06 MaggiEtAl11

52 On the visualization of processes P. 52 Imperative vS declarative Imperative Declarative Declarative models work better in presence of a partial specification of the process scheme

53 Declare constraint templates P. 53 Existence templates Existence(n, A) Activity A occurs at least n times in the process instance BCAAC ✓ BCAAAC ✓ BCAC ✗ (for n = 2 ) Absence(A) Activity A does not occur in the process instance BCC ✓ BCAC ✗ Absence(n+1, A) Activity A occurs at most n times in the process instance BCAAC ✗ BCAC ✓ BCC ✓ (for n = 1 ) Exactly(n, A) Activity A occurs exactly n times in the process instance BCAAC ✓ BCAAAC ✗ BCAC ✗ (for n = 2 ) Init(A) Activity A is the first to occur in each process instance BCAAC ✗ ACAAAC ✓ BCC ✗

54 Declare constraint templates P. 54 Relation templates RespondedExistence(A, B) If A occurs in the process instance, then B occurs as well CAC ✗ CAACB ✓ BCAC ✓ BCC ✓ Response(A, B) If A occurs in the process instance, then B occurs after A BCAAC ✗ CAACB ✓ CAC ✗ BCC ✓ AlternateResponse(A, B) Each time A occurs in the process instance, then B occurs afterwards, before A recurs BCAAC ✗ CAACB ✗ CACB ✓ CABCA ✗ BCC ✓ CACBBAB ✓ ChainResponse(A, B) Each time A occurs in the process instance, then B occurs immediately afterwards BCAAC ✗ BCAABC ✗ BCABABC ✓

55 Declare constraint templates P. 55 Relation templates RespondedExistence(B, A) If B occurs in the process instance, then A occurs as well CAC ✓ CAACB ✓ BCAC ✓ BCC ✗ Precedence(A, B) B occurs in the process instance only if preceded by A BCAAC ✗ CAACB ✓ CAC ✓ BCC ✓ AlternatePrecedence(A, B) Each time B occurs in the process instance, it is preceded by A and no other B can recur in between BCAAC ✗ CAACB ✓ CACB ✓ CABCA ✓ BCC ✗ CACBAB ✓ ChainPrecedence(A, B) Each time B occurs in the process instance, then B occurs immediately beforehand BCAAC ✗ BCAABC ✗ CABABCA ✓

56 Declare constraint templates P. 56 Relation templates CoExistence(A, B) If B occurs in the process instance, then A occurs, and viceversa CAC ✗ CAACB ✓ BCAC ✓ BCC ✗ Succession(A, B) A occurs if and only if it is followed by B in the process instance BCAAC ✗ CAACB ✓ CAC ✗ BCC ✗ AlternateSuccession(A, B) A and B occur in the process instance if and only if the latter follows the former, and they alternate each other in the trace BCAAC ✗ CAACB ✗ CACB ✓ CABCA ✗ BCC ✗ CACBAB ✓ ChainSuccession(A, B) A and B occur in the process instance if and only if the latter immediately follows the former BCAAC ✗ BCAABC ✗ CABABC ✓

57 Declare constraint templates P. 57 Negative relation templates NotCoExistence(A, B) A and B never occur together in the process instance CAC ✓ CAACB ✗ BCAC ✗ BCC ✓ NotSuccession(A, B) A can never occur before B in the process instance BCAAC ✓ CAACB ✗ CAC ✓ BCC ✓ NotChainSuccession(A, B) A and B occur in the process instance if and only if the latter does not immediately follows the former BCAAC ✓ BCAABC ✗ CBACBA ✓

58 Relation constraint templates subsumption P. 58 Constraint templates are not independent of each other

59 Relation constraint templates subsumption E.g., A trace like ABABCABCC satisfies (w.r.t. A and B ): RespondedExistence(A, B), RespondedExistence(B, A), CoExistence(A, B), CoExistence(B, A), Response(A, B), AlternateResponse(A, B), ChainResponse(A, B), Precedence(A, B), AlternatePrecedence(A, B), ChainPrecedence(A, B), Succession(A, B), AlternateSuccession(A, B), ChainSuccession(A, B) The mining algorithm would show the most strict constraint only ( ChainSuccession(A, B) ) MINERful, the mining algorithm of MailOfMine, faces this unresolved issue P. 59 Constraint templates are not independent of each other

60 MINERful Key idea: building a knowledge base with local and global statistics on the mutual order of appearance of events for further fast querying Performances: the algorithm is proven to be fast (over 12m events processed in less than 170 secs.) Asymptotically: linear in the number of the traces quadratic in the number of events per trace i.e., polynomial in the input size linear in the number of constraint templates See [DiCiccioMecella/TR12] for further readingDiCiccioMecella/TR12 P. 60 The declarative workflow mining algorithm of MailOfMine

61 LTL semantics ConDec, DecSerFlow and Declare adopt Linear-time Temporal Logic (LTL) for expressing the semantics of the constraint templates. See [AalstEtAl06, MaggiEtAl11] for further readingAalstEtAl06 MaggiEtAl11 Van der Aalst, W. M. P. : “Auditing 2.0 Using Process Mining to Support Tomorrow's Auditor” Available at cess-mining-and-auditing-siks-course-2010-wvda.pdf cess-mining-and-auditing-siks-course-2010-wvda.pdf See slides P. 61

62 On the representation of artful process schemata In MailOfMine, each constraint in the set which can be used to define an artful mined process is expressible through regular grammars, where: – activities are terminal characters, building blocks of constraints on tasks; – constraints are regular expressions, equivalent to regular grammars; – the process scheme is the intersection of constraints defined on top of activities. The process scheme defines a Process Describing Grammar ( PDG ) Regular grammars expressing declarative workflows P. 62

63 On the usage of regular grammars The rationale: why not LTL for declarative workflows? Temporal logic is a formalism for describing sequences of transitions between states in a reactive system Linear Temporal Logic (LTL, [Pnueli77]) describes events along a single computation pathPnueli77 LTL formulæ are verified over semi-infinite runs – defined over Kripke structures They are good for automatically checking the correct work of circuits or server programs – Not for human processes which have both a starting point and an end Regular grammars are verified by Finite State Automata – working with less complex algorithms, in terms of computational effort A PDG describes the language spoken by collaborative organisms in terms of activities P. 63

64 On the usage of regular grammars Constraint templates as regular expressions P. 64

65 Example: Mining Human Habits P. 65 Examples of desirable models

66 Example: Mining Human Habits P. 66 Log granularity issue The granularity of the log produced by a smart house is too much fine to be fed as input of a process mining algorithm

67 Example: Mining Human Habits P. 67 Log granularity issue: a solution

68 References [Aalst2011.book] van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011). [AalstEtAl2009] van der Aalst, W.M.P., van Dongen, B.F., Güther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: Prom: The process mining toolkit. In de Medeiros, A.K.A., Weber, B., eds.: BPM (Demos). Volume 489 of CEUR Workshop Proceedings., CEUR-WS.org (2009) [AalstEtAl2004] van der Aalst, W.M.P., Weijters, T., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9) (2004) 1128–1142. [WenEtAl2007] Wen, L., van der Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Discov. 15(2) (2007) 145–180. [GüntherEtAl2007] Günther, C.W., van der Aalst, W.M.P.: Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics. BPM 2007: [WeijtersEtAl2001] Weijters, A., van der Aalst, W.: Rediscovering workflow models from event-based data using little thumb. Integrated Computer-Aided Engineering 10 (2001) [MedeirosEtAl2007] Medeiros, A.K., Weijters, A.J., Aalst, W.M.: Genetic process mining: an experi- mental evaluation. Data Min. Knowl. Discov. 14(2) (2007) 245–304. [AalstEtAl2010] van der Aalst, W., Rubin, V., Verbeek, H., van Dongen, B., Kindler, E., Gnther, C.: Process mining: a two-step approach to balance between underfitting and overfitting. Software and Systems Modeling 9 (2010) 87– /s z. Cited articles and resources, in order of appearance P. 68

69 References [HillEtAl06] Hill, C., Yates, R., Jones, C., Kogan, S.L.: Beyond predictable workflows: Enhancing productivity in artful business processes. IBM Systems Journal 45(4), 663–682 (2006) [ACTIVE09] Warren, P., Kings, N., et al.: Improving knowledge worker productivity - the active integrated approach. BT Technology Journal 26(2), 165–176 (2009) [DiCiccioEtAl11] Di Ciccio, C., Mecella, M., Catarci, T.: Representing and Visualizing Mined Artful Processes in MailOfMine. USAB 2011:83-94 [DiCiccioMecella12] Di Ciccio, C., Mecella,M.: Mining constraints for artful processes. In W. Abramowicz, D. Kriksciuniene, V.S., ed.: 15th International Conference on Business Information Systems. Volume 117 of Lecture Notes in Business Information Processing., Springer (2012) (to appear). [DiCiccioMecella/TR12] Di Ciccio, C., Mecella, M.: MINERful, a mining algorithm for declarative process constraints in MailOfMine. Technical report, Dipartimento di Ingegneria Infor- matica, Automatica e Gestionale “Antonio Ruberti” – SAPIENZA, Universita` di Roma (2012). [AalstEtAl06] van der Aalst, W.M.P., Pesic, M.: Decserflow: Towards a truly declarative service flow language. Proc. WS-FM 2006 [MaggiEtAl11] Maggi, F.M., Mooij, A.J., van der Aalst, W.M.P.: User-guided discovery of declar- ative process models. In: CIDM, IEEE (2011) 192–199 [Pnueli77] Pnueli, A.: The Temporal Logic of Programs. Proc. 18th Annual Symposium on Foundations of Software Technology and Theoretical Computer Science, 1977 Cited articles and resources, in order of appearance P. 69


Download ppt "1 Process Mining Seminars in Software and Services for the Information Society Rome, 2014, May the 5th Francesco Leotta Dipartimento di Ingegneria Informatica,"

Similar presentations


Ads by Google