Presentation is loading. Please wait.

Presentation is loading. Please wait.

Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.

Similar presentations

Presentation on theme: "Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst."— Presentation transcript:

1 Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst

2 Introduction □ The major application of process mining  Discovery ⇒ extraction of abstract process knowledge from event logs □ Real-life business processes are flexible  Spaghetti model  Single cases differ significantly from one another = ‘Diversity’  Discovering actual process which is being executed is valuable. □ Solution for diversity of cases  Measure the similarity of cases and use the information to divide the set of cases into more homogeneous subsets.  Trace clustering

3 Running Example □ Repair process of products within an electronic company that makes navigation and mobile phones  Case: a specific row  Trace: the sequence of events within a case  Events: represented by the case identifier, activity identifier, and originator Case identifier Activity identifier Originator

4 Running Example Navigation system Mobile Phone Repair is Canceled □ Trace clustering can support the identification of process variants corresponding to homogenous subsets of cases

5 Trace profiles □ In the trace clustering approach, each case is characterized by a defined set of items, i.e., specific features which can be extracted from the corresponding trace. □ Items for comparing traces are organized in trace profiles, each addressing a specific perspective of the log

6 Trace profiles □ Information in Event log  WorkflowLog  group any number of process elements  ProcessInstance  a case  AuditTrailEntry  events  WorkflowModelElement  name of event  mandatory event attribute  EventType  identify lifecycle transitions  mandatory event attribute  Timestamp, Originator  optional data fields

7 Trace profiles □ Profile  A set of related items which describe the trace from a specific perspective □ Every item is a metric ⇒ we can consider a profile with n items to be a function, which assigns to a trace a vector ( i 1, i 2, … i n ) □ Profiling a log can be described as measuring a set of traces with a number of profiles, resulting in an aggregate vector  Resulting vectors can subsequently be used to calculate the distance between any two traces, using a distance metric

8 Trace profiles

9 Clustering Methods - Distance Measures □ Distance Measures  To calculate the similarity between cases □ Three distance measures  n : the number of items extracted from the process log  case c j : corresponds to the vector ( i j1, i j2, … i jn )  i jk : the number of appearance of item k in the case j

10 Clustering Methods – C lustering Algorithm □ K-means clustering  A method of cluster analysis  aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. □ QT (quality threshold) clustering  A method of partitioning data  invented for gene clustering  requires more computing power than k-means  but does not require specifying the number of clusters a priori  predictable - always returns the same result when run several times.  guided by a quality threshold(determines the maximum diameter of clusters)

11 Clustering Methods – C lustering Algorithm □ Agglomerative hierarchical clustering  Gradually generate clusters by merging nearest traces  Smaller clusters are merged into large ones  Example: we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster. Usually, we want to take the two closest elements, according to the chosen distance.

12 Clustering Methods – C lustering Algorithm □ The Self-Organizing Map (SOM)  Used to map high dimensional data onto low dimensional spaces  grouping similar cases close together in certain areas of the value range  can be used to portray complex correlations in statistical data.  Example: World Bank statistics of countries in  39 indicators describing various quality-of-life factors were used  Countries that had similar values of the indicators place near each other on the map  different clusters were automatically encoded with different bright colors  each country was assigned a color describing its poverty type in relation to other countries  The poverty structures of the world: each country on the geographic map has been colored according to its poverty type.

13 Case study □ ProM  Support various process mining algorithm  Implemented the trace clustering plug-in in ProM □ Process log from AMC hospital in Amsterdam, Netherlands  619 gynecological oncology patients (treated in 2005, 2006) = 619 cases  52 diagnostic activities  3,574 events, 34 departments are involved

14 Case study □ Process model for all cases obtained using the Heuristic Miner

15 Case study □ The result obtained by applying the trace clustering plug-in in ProM □ The cases in the same cell = belong to the same cluster cluster (1,2) 352 cluster (3,1) 113

16 Case study □ Result for cluster (1,2)  352 cases (more than half of the entire cases)  Only 11 activities ⇒ Simple  Patients who are diagnosed by another hospital and are referred to the AMC hospital for treatment

17 Case study □ Result for cluster (3,1)  113 cases  Complex as the original process model  Patients who are not diagnosed and need more complex and detailed diagnostic activities

18 Conclusion □ Process mining techniques can deliver valuable, factual insights into how processes are being executed in real life  Important for analyzing flexible environments □ Trace clustering operates on the event log level  Improve the results of any process mining algorithm □ Both the approach and implementation are straightforward to extend  Ex: By adding domain-specific profiles or further clustering algorithm

Download ppt "Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst."

Similar presentations

Ads by Google