Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {

Similar presentations


Presentation on theme: "Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {"— Presentation transcript:

1 Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa { shernandez@unizar.es,s.hernandez.de.Mesa@tue.nl }shernandez@unizar.ess.hernandez.de.Mesa@tue.nl Eindhoven, The Netherlands 12th March, 2015

2 Distributed Process Discovery From Large Event Logs Distributed Process Discovery A Framework for Distributed Computing Summary and Future Work Outline Sergio Hernández de Mesa 12th March, 2015 2

3 Distributed Process Discovery From Large Event Logs Distributed Process Discovery A Framework for Distributed Computing Summary and Future Work Outline Sergio Hernández de Mesa 12th March, 2015 3

4 Distributed Process Discovery From Large Event Logs Distributed Process Discovery Sergio Hernández de Mesa 12th March, 2015 Big Data 4

5 Distributed Process Discovery From Large Event Logs Distributed Process Discovery Sergio Hernández de Mesa 12th March, 2015 The 3 V’s of Big Data 5

6 Distributed Process Discovery From Large Event Logs Distributed Process Discovery Sergio Hernández de Mesa 12th March, 2015 Big Data and process discovery 6 XES logs CSV files Offline analysis Real-time analysis Data streams TB GB MB

7 Distributed Process Discovery From Large Event Logs Distributed Process Discovery Sergio Hernández de Mesa 12th March, 2015 Actual problem 7

8 Distributed Process Discovery From Large Event Logs Distribute/Parallelize process discovery techniques - Inductive Miner - Alpha Miner - Heuristics Miner - … Take advantage of HPC infrastructures and parallel programming models - Clusters, grids and clouds - MapReduce Distributed Process Discovery Performance improvement opportunities Sergio Hernández de Mesa 12th March, 2015 8

9 Distributed Process Discovery From Large Event Logs No computing resources No computing resources but money Hadoop Cluster HPC infrastructure Sergio Hernández de Mesa 12th March, 2015 9 “Classical” ProM Amazon Elastic MapReduce MapReduce model Distributed approach Distributed Process Discovery Execution scenarios

10 Distributed Process Discovery From Large Event Logs MapReduce - Programming model for data-oriented applications - Proposed by Google - Map: (k 1, v 1 )  list (k 2,v 2 ) - Reduce: (k 2, list(v 2 ) )  list (v 3 ) Hadoop - Software for reliable, scalable and distributed computing - Developed by Apache - Core components: Hadoop Distributed File System (HDFS) Hadoop MapReduce Hadoop YARN Sergio Hernández de Mesa 12th March, 2015 10 Distributed Process Discovery MapReduce and Hadoop

11 Distributed Process Discovery From Large Event Logs Distributed Process Discovery Sergio Hernández de Mesa 12th March, 2015 Step 1 Directly-Follows Graph DFG Process Model XES log XES log Optimized version of Step 1 – Reading data as a stream (SAXParser) – HashMaps to efficiently count frequencias Example XES log: 100 million traces (40 activities) – Size: 218 GB – Step 1: XES to DFG: ~ 2-3 hours – Step 2: DFG to Process Model: ~ few seconds Step 2 Motivational example: Inductive Miner 11

12 Distributed Process Discovery From Large Event Logs HDFS (Hadoop Distributed File System) HDFS (Hadoop Distributed File System) HDFS (Hadoop Distributed File System) XES Logs Block 1 Block 2 Block N …...... ….. ….. … MAP 1 MAP 2 MAP N … … … … … … … … … … … DFG 1 DFG 2 DFG N REDUCEREDUCE REDUCEREDUCE FINAL DFG Split phase Distributed Process Discovery Computing DFG: Hadoop/MapReduce approach …… Sergio Hernández de Mesa 12th March, 2015 12

13 Distributed Process Discovery From Large Event Logs XES Logs XES sublog 1 XES sublog 2 XES Sublog N XES Sublog N … XES2DFG (MAP) XES2DFG (MAP) XES2DFG (MAP) XES2DFG (MAP) XES2DFG (MAP) XES2DFG (MAP) … DFG 1 DFG 2 DFG N REDUCE_DFGSREDUCE_DFGS REDUCE_DFGSREDUCE_DFGS FINAL DFG … Distributed Process Discovery Computing DFG: Distributed/HPC approach Sergio Hernández de Mesa 12th March, 2015 13

14 Distributed Process Discovery From Large Event Logs Distributed Process Discovery A Framework for Distributed Computing Summary and Future Work Outline Sergio Hernández de Mesa 12th March, 2015 14

15 Distributed Process Discovery From Large Event Logs A Framework for Distributed Computing Sergio Hernández de Mesa 12th March, 2015 15 Scientific computing

16 Distributed Process Discovery From Large Event Logs A Framework for Distributed Computing Sergio Hernández de Mesa 12th March, 2015 16 Heterogeneous Execution Environments

17 Distributed Process Discovery From Large Event Logs A Framework for Distributed Computing Sergio Hernández de Mesa 12th March, 2015 17 Challenges of scientific computing in HPC Strong coupling between applications and execution environments Lifecycle management Using multiple computing infrastructures

18 Distributed Process Discovery From Large Event Logs A Framework for Distributed Computing Sergio Hernández de Mesa 12th March, 2015 18 Framework architecture

19 Distributed Process Discovery From Large Event Logs Amazon EC2 Mediator A Framework for Distributed Computing Sergio Hernández de Mesa 12th March, 2015 HERMES Mediator Message bus HERMES Meta-scheduler Fault Management User application JSDL Message ✓ Selecting a computing infrastructure Job execution ✘ Selecting fault handling policy Resubmission Alternative infrastructure Aborting job execution 19 Framework operation Job execution ✓

20 Distributed Process Discovery From Large Event Logs Distributed Process Discovery A Framework for Distributed Computing Summary and Future Work Outline Sergio Hernández de Mesa 12th March, 2015 20

21 Distributed Process Discovery From Large Event Logs Summary and Future Work Summary Sergio Hernández de Mesa 12th March, 2015 21 Inductive Miner Alpha Miner Heuristics Miner …

22 Distributed Process Discovery From Large Event Logs New ProM plugin No computing resources No computing resources but money Hadoop Server HPC infrastructure Sergio Hernández de Mesa 12th March, 2015 22 “Classical” ProM Amazon Elastic MapReduce MapReduce model Distributed approach Summary and Future Work Execution scenarios

23 Distributed Process Discovery From Large Event Logs Summary and Future Work Solution approach Sergio Hernández de Mesa 12th March, 2015 23

24 Distributed Process Discovery From Large Event Logs Process discovery from Large Event Logs - “Sequential” way: Time-consuming - Solution approach: MapReduce and Distributed computing Current state - Code developed for distributed computing DFGs - Setting up Hadoop Cluster Future Work - Integration with the distributed computing framework - Development of a ProM plugin Sergio Hernández de Mesa 12th March, 2015 24 Summary and Future Work Conclusions

25 Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa { shernandez@unizar.es,s.hernandez.de.Mesa@tue.nl }shernandez@unizar.ess.hernandez.de.Mesa@tue.nl Eindhoven, The Netherlands 12th March, 2015


Download ppt "Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {"

Similar presentations


Ads by Google