Presentation is loading. Please wait.

Presentation is loading. Please wait.

Process Mining An index to the state of the art and an outline of open research challenges at DIAG Claudio Di Ciccio, Massimo Mecella Seminars in Software.

Similar presentations


Presentation on theme: "Process Mining An index to the state of the art and an outline of open research challenges at DIAG Claudio Di Ciccio, Massimo Mecella Seminars in Software."— Presentation transcript:

1 Process Mining An index to the state of the art and an outline of open research challenges at DIAG Claudio Di Ciccio, Massimo Mecella Seminars in Software and Services for the Information Society

2 Process Mining Process Mining [Aalst2011.book], also referred to as Workflow Mining, is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded real executions (logs).Aalst2011.book ProM [AalstEtAl2009] is one of the most used plug- in based software environment for implementing workflow mining (and more) techniques.AalstEtAl2009 The new version 6.0 is available for download at Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 2 Definition

3 Process Mining Process Mining involves: Process discovery Control flow mining, organizational mining, decision mining; workflow Conformance checking Operational support We will focus on the control flow mining Many control flow mining algorithms proposed α [AalstEtAl2004], α ++ [WenEtAl2007] and α # [WenEtAl2010]AalstEtAl2004WenEtAl2007WenEtAl2010 Fuzzy [GüntherAalst2007]GüntherAalst2007 Heuristic [WeijtersEtAl2001]WeijtersEtAl2001 Genetic [MedeirosEtAl2007]MedeirosEtAl2007 Two-step [AalstEtAl2010]AalstEtAl2010 … Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 3 Definition

4 Process Mining The rest of the lesson is based on the following material: Van der Aalst, W. M. P.: Process Discovery: An Introduction Available at mining_chapter_05_process_discovery.pdf mining_chapter_05_process_discovery.pdf From the teaching material for [Aalst2011.book]Aalst2011.book De Medeiros, A. K. A.: Process Mining: Control-Flow Mining Algorithms Available at owminingalgorithms.ppt owminingalgorithms.ppt Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 4 Further reading

5 Artful processes [HillEtAl06]HillEtAl06 –informal processes typically carried out by those people whose work is mental rather than physical (managers, professors, researchers, engineers, etc.) knowledge workers [ACTIVE09]ACTIVE09 Knowledge workers create artful processes on the fly Though artful processes are frequently repeated, they are not exactly reproducible, even by their originators, nor can they be easily shared –Loosely structured –Highly flexible Artful processes and knowledge workers A different context for process mining Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 5

6 A different context (2) In collaborative contexts, knowledge workers share their information and outcomes with other knowledge workers –E.g., a software development mgr. Typically, by means of several conversations – conversations are actual traces of running processes that knowledge workers adhere to conversations P. 6 Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)

7 A different context (4) Personal information management (PIM) –how to organize ones own activities, contacts, etc. through the usage of software Information warfare –in supporting anti-crime intelligence agencies Enterprise engineering –for knowledge-heavy industries, where preserving documents making up product data is not enough eHealth –for the automatic discovery of medical treatment procedures on top of patient health records Some areas of applicability P. 7 Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)

8 MailOfMine MailOfMine is the approach and the implementation of a collection of techniques, the aim of which is to is to automatically build, on top of a collection of messages, a set of workflow models that represent the artful processes laying behind the knowledge workers activities. [DiCiccioEtAl11]DiCiccioEtAl11 [DiCiccioMecella12]DiCiccioMecella12 [DiCiccioMecella/TR12]DiCiccioMecella/TR12 [DiCiccioMecella13]DiCiccioMecella13 Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 8 What is MailOfMine?

9 On the visualization of processes P. 9 The imperative model Represents the whole process at once The most used notation is based on a subclass of Petri Nets (namely, the Workflow Nets) Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)

10 On the visualization of processes Rather than using a procedural language for expressing the allowed sequence of activities, it is based on the description of workflows through the usage of constraints the idea is that every task can be performed, except the ones which do not respect such constraints this technique fits with processes that are highly flexible and subject to changes, such as artful processes P. 10 The declarative model If A is performed, B must be perfomed, no matter before or afterwards (responded existence) Whenever B is performed, C must be performed afterwards and B can not be repeated until C is done (alternate response) The notation here is based on [AalstEtAl06, MaggiEtAl11] (DecSerFlow, Declare)AalstEtAl06 MaggiEtAl11 Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)

11 On the visualization of processes P. 11 Imperative vS declarative Imperative Declarative Declarative models work better in presence of a partial specification of the process scheme Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)

12 A real discovered process model Spaghetti process [ Aalst2011.book ] Aalst2011.book Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 12

13 Declare constraint templates P. 13 Existence templates Existence(n, A) Activity A occurs at least n times in the process instance BCAAC BCAAAC BCAC (for n = 2 ) Absence(A) Activity A does not occur in the process instance BCC BCAC Absence(n+1, A) Activity A occurs at most n+1 times in the process instance BCAAC BCAC BCC (for n = 2 ) Exactly(n, A) Activity A occurs exactly n times in the process instance BCAAC BCAAAC BCAC (for n = 2 ) Init(A) Activity A is the first to occur in each process instance BCAAC ACAAAC BCC Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)

14 Declare constraint templates P. 14 Relation templates RespondedExistence(A, B) If A occurs in the process instance, then B occurs as well CAC CAACB BCAC BCC Response(A, B) If A occurs in the process instance, then B occurs after A BCAAC CAACB CAC BCC AlternateResponse(A, B) Each time A occurs in the process instance, then B occurs afterwards, before A recurs BCAAC CAACB CACB CABCA BCC CACBBAB ChainResponse(A, B) Each time A occurs in the process instance, then B occurs immediately afterwards BCAAC BCAABC BCABABC Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)

15 Declare constraint templates P. 15 Relation templates RespondedExistence(B, A) If B occurs in the process instance, then A occurs as well CAC CAACB BCAC BCC Precedence(A, B) B occurs in the process instance only if preceded by A BCAAC CAACB CAC BCC AlternatePrecedence(A, B) Each time B occurs in the process instance, it is preceded by A and no other B can recur in between BCAAC CAACB CACB CABCA BCC CACBAB ChainPrecedence(A, B) Each time B occurs in the process instance, then B occurs immediately beforehand BCAAC BCAABC CABABCA Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)

16 Declare constraint templates P. 16 Relation templates CoExistence(A, B) If B occurs in the process instance, then A occurs, and viceversa CAC CAACB BCAC BCC Succession(A, B) A occurs if and only if it is followed by B in the process instance BCAAC CAACB CAC BCC AlternateSuccession(A, B) A and B occur in the process instance if and only if the latter follows the former, and they alternate each other in the trace BCAAC CAACB CACB CABCA BCC CACBAB ChainSuccession(A, B) A and B occur in the process instance if and only if the latter immediately follows the former BCAAC BCAABC CABABC Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)

17 Declare constraint templates P. 17 Negative relation templates NotCoExistence(A, B) A and B never occur together in the process instance CAC CAACB BCAC BCC NotSuccession(A, B) A can never occur before B in the process instance BCAAC CAACB CAC BCC NotChainSuccession(A, B) A and B occur in the process instance if and only if the latter does not immediately follows the former BCAAC BCAABC CBACBA Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)

18 Relation constraint templates subsumption Constraint templates are not independent of each other Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 18

19 MINERful ++ MINERful ++ is the workflow discovery algorithm of MailOfMine Its input is a collection of strings T and an alphabet Σ T Each string t is a trace Each character is an event (enacted task) The collection represents the log Its output is a declarative process model What is a declarative process model? Our mining algorithm Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 19

20 The declarative specification of an artful process (e.g.) A project meeting is scheduled We suppose that a final agenda will be committed (confirmAgenda) after that requests for a new proposal (requestAgenda), proposals themselves (proposeAgenda) and comments (commentAgenda) have been circulated. Shortcuts for tasks (process alphabet): –p ( p roposeAgenda) –r ( r equestAgenda) –c ( c ommentAgenda) –n (co n firmAgenda) Scenario Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 20

21 The declarative specification of an artful process (e.g.) Existence constraints 1. Participation(n) 2. Uniqueness(n) 3. End(n) Relation constraints 4. Response(r, p) 5. RespondedExistence(c, p) 6. Succession(p, n) The agenda 1. must be confirmed, 2. only once: 3. it is the last thing to do. During the compilation: 4. the proposal follows a request; 5. if a comment circulates, there has been / will be a proposal; 6. after the proposal, there will be a confirmation, and there can be no confirmation without a preceding proposal. Constraints on activities Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 21

22 MINERful ++ MINERful ++ is a two-step algorithm 1.Construction of a Knowledge Base 2.Constraints inference by means of queries evaluated on the KB This allows the discovery of constraints through a faster procedure on data which are smaller in size than the whole input Returned constraints are weighted with their support the normalized fraction of cases in which the constraint is verified over the set of input traces Workflow discovery by constraints inference Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 22

23 MINERful ++ by example In order to see how it works now we see a run of MINERful ++ over a string, compliant with the previous example: r r p c r p c r c p c n We start with the construction of the algorithms Knowledge Base Workflow discovery by constraints inference Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 23

24 MINERful ++ by example p –p occurred 3 times in 1 string γ p (3) = 1 For each m 3 γ p (m) = 0 –p did not occur as the first nor as the last character g i (p) = 0 g l (p) = 0 n –γ n (1) = 1 –For each m 1, γ n (m) = 0 –n occurred as the last character in 1 string g i (n) = 0 g l (n) = 1 Building the ownplay of p and n r r p c r p c r c p c n Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 24

25 MINERful ++ by example With respect to the occurrence of p, n occurred… i.Never before: 3 times δ p,n (-) = 3 ii.2 chars after: 1 time δ p,n (2) = 1 iii.6 chars after: 1 time δ p,n (6) = 1 iv.9 chars after: 1 time δ p,n (9) = 1 v.Repetitions in-between: i.onwards: 2 times b p,n = 2 ii.backwards: never b p,n = 0 Looking at the string i. r r p c r p c r c p c n ii. r r p c r p c r c p c n iii. r r p c r p c r c p c n iv. r r p c r p c r c p c n v. i. r r p c r p c r c p c n ii. r r p c r p c r c p c n Building the interplay of p and n Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 25

26 MINERful ++ by example δ r,p Building the interplay of r and p b r,p = 1 b r,p = 0 r r p c r p c r c p c n Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 26

27 MINERful ++ Let r and p be the number of times in which r and p respectively appear in the log we have, e.g., support for Response(r, p) hint: how many times p was not read in the traces after r occurred? In those cases, Response(r, p) does not hold support for Succession(r, p) how many times p was not read in the traces after r occurred, nor r was read before p occurred? In those cases, Succession(r, p) does not hold … Computing the support of constraints Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 27

28 MINERful ++ Computing the support of constraints Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 28

29 MINERful ++ by example 1.r r p c r p c r c p c n 2.r c c r p p c p c r c p p n 3.c c c c c r r p c r r p r r p r p p p n 4.c p r p n 5.r p p n 6.r r r c r c p n 7.r p c n 8.c r p r r c c p p n 9.p c c n 10.c r r c c c r r p c p c c p n Support for… –Response(r, p) 1.0 –Succession(r, p) –Precedence(c, n) 0.9 The support can be used to prune out those constraints falling under a given threshold –E.g., 0.95 A threshold equal to 1.0 selects those constraints which are always valid on the log Computing the support of constraints Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 29

30 Relation constraint templates subsumption E.g., –A trace like a b a b c a b c c satisfies (w.r.t. b and a ): RespondedExistence(a, b), RespondedExistence(b, a), CoExistence(a, b), CoExistence(b, a), Response(a, b), AlternateResponse(a, b), ChainResponse(a, b), Precedence(a, b), AlternatePrecedence(a, b), ChainPrecedence(a, b), Succession(a, b), AlternateSuccession(a, b), ChainSuccession(a, b) –The mining algorithm would have to show the most strict constraint only ( ChainSuccession(a, b) ) MINERful ++ faces this issue, by pruning the returned constraints on the basis of the subsumption hierarchy of constraints Constraint templates are not independent of each other Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 30

31 Relation constraint templates subsumption Constraint templates are not independent of each other Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 31

32 Relation constraint templates subsumption A hint on the pruning procedure 1. 0 Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 32

33 Relation constraint templates subsumption A hint on the pruning procedure ? ? ? ? SupportSupport SupportSupport Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 33

34 Evaluation MINERful ++ is –Independent on the formalism used for expressing constraints –Modular (two-phase) –Capable of eliminating redundancy in the process model –Fast Characteristics of MINERful ++ Sony VAIO VGN-FE11H Intel Core Duo T GHz 2 MB L2 cache 2 GB of DDR2 667 Mhz Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 34

35 Evaluation Linear w.r.t. the number of traces in the log |T| Quadratic w.r.t. the size of traces in the log |t max | Quadratic w.r.t. the size of the alphabet |Σ T | Hence, polynomial in the size of the input O(|T|·|t max | 2 ·|Σ T | 2 ) On the complexity of MINERful ++ Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 35

36 Preliminary results MINERful ++ on real data Logs extracted from 2 mailboxes –[DiCiccioMecella2012]DiCiccioMecella traces –34.75 events each on average –139 events read in total Evaluation of inferred constraints conducted with an expert domain Precision Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 36

37 Future work Integrate MINERful ++ with ProM –MINERful ++ is already capable of reading/writing XES logs Study the effects of errors in logs on the inferred workflow –Error-injected synthetic logs can help us conduct an automated analysis on the quality of results Refine the estimation calculi for support Auto-tune the support threshold –Depending on the constraint template, which can be more or less robust Enlarge the set of discovered constraint templates to the branching Declare constraints Research in progress Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 37

38 References [Aalst2011.book] van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011). [AalstEtAl2009] van der Aalst, W.M.P., van Dongen, B.F., Güther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: Prom: The process mining toolkit. In de Medeiros, A.K.A., Weber, B., eds.: BPM (Demos). Volume 489 of CEUR Workshop Proceedings., CEUR-WS.org (2009) [AalstEtAl2004] van der Aalst, W.M.P., Weijters, T., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9) (2004) 1128–1142. [WenEtAl2007] Wen, L., van der Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free- choice constructs. Data Min. Knowl. Discov. 15(2) (2007) 145–180. [WenEtAl2010] Wen, L., Wang, J., van der Aalst, W. M. P., Huang, B., Sun, J.: Mining process models with prime invisible tasks. Data Knowl. Eng., 69 (2010), [GüntherEtAl2007] Günther, C.W., van der Aalst, W.M.P.: Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics. Proc. BPM [WeijtersEtAl2001] Weijters, A., van der Aalst, W.: Rediscovering workflow models from event-based data using little thumb. Integrated Computer-Aided Engineering 10 (2001) [MedeirosEtAl2007] Medeiros, A.K., Weijters, A.J., Aalst, W.M.: Genetic process mining: an experi- mental evaluation. Data Min. Knowl. Discov. 14(2) (2007) 245–304. [AalstEtAl2010] van der Aalst, W., Rubin, V., Verbeek, H., van Dongen, B., Kindler, E., Gnther, C.: Process mining: a two-step approach to balance between underfitting and overfitting. Software and Systems Modeling 9 (2010) 87– Cited articles and resources, in order of appearance Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 38

39 References [HillEtAl06] Hill, C., Yates, R., Jones, C., Kogan, S.L.: Beyond predictable workflows: Enhancing productivity in artful business processes. IBM Systems Journal 45(4), 663–682 (2006) [ACTIVE09] Warren, P., Kings, N., et al.: Improving knowledge worker productivity - the ACTIVE integrated approach. BT Technology Journal 26(2), 165–176 (2009) [DiCiccioEtAl11] Di Ciccio, C., Mecella, M., Catarci, T.: Representing and visualizing mined artful processes in MailOfMine. Proc. USAB 2011:83-94 [DiCiccioMecella12] Di Ciccio, C., Mecella,M.: Mining constraints for artful processes. Proc. BIS 2012 [DiCiccioMecella/TR12] Di Ciccio, C., Mecella, M.: MINERful, a mining algorithm for declarative process constraints in MailOfMine. Technical report, Dipartimento di Ingegneria Informatica, Automatica e Gestionale Antonio Ruberti – SAPIENZA, Universita` di Roma (2012). [DiCiccioMecella13] Di Ciccio, C., Mecella, M.: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows. Proc. CIDM 2013 [AalstEtAl06] van der Aalst, W.M.P., Pesic, M.: Decserflow: Towards a truly declarative service flow language. Proc. WS-FM 2006 [MaggiEtAl11] Maggi, F.M., Mooij, A.J., van der Aalst, W.M.P.: User-guided discovery of declar- ative process models. Proc. CIDM 2011 Cited articles and resources, in order of appearance Process Mining Claudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 39


Download ppt "Process Mining An index to the state of the art and an outline of open research challenges at DIAG Claudio Di Ciccio, Massimo Mecella Seminars in Software."

Similar presentations


Ads by Google