Presentation is loading. Please wait.

Presentation is loading. Please wait.

Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.

Similar presentations


Presentation on theme: "Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining."— Presentation transcript:

1 Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.

2 Definitions – Data Profiling  The use of analytical techniques about data for the purpose of developing a thorough knowledge of its content, structure and quality.(http://www.bitpipe.com/tlist/Data-Profiling.html)http://www.bitpipe.com/tlist/Data-Profiling.html

3 Definition 2 – Data Profiling  Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data. The purpose of these statistics may be to:  find out whether existing data can easily be used for other purposes  give metrics on data quality including whether the data conforms to company standards  assess the risk involved in integrating data for new applications, including the challenges of joins  track data quality  assess whether metadata accurately describes the actual values in the source database  understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can incur time delays and project cost overruns.  have an enterprise view of all data, for uses such as Master Data Management where key data is needed, or Data governance for improving data quality (http://en.wikipedia.org/wiki/Data_profiling)

4 What could Process profiling be  the practice of tracking information about processes by monitoring their execution. This can be done by analyzing the case perspective, process perspective and resource perspective to assess their behavior, predict certain characteristics and to configure optimum runtime parameters.

5 Possible Applications  Analyzing rendering behavior. A user could be provided with a set of options that allow one to analyze very specific rendering behavior in parts of a process.  Profiling process outcomes. the use of some techniques to analyze the outcome of processes in order to determine what may be causing the observed behavior.  Event Tracing and Prediction. Based on an events log, real-time event logs can be traced to troubleshoot, determine where performance issues are occurring and predict the likely execution pattern.

6 Common Approaches  Data mining techniques commonly used:  Association rule mining  Clustering

7 Review of Literature Section 2

8 Association Rule mining  (R. Vaarandi 2003)  Association rules can be used to create a system profile by considering the most frequently occurring behavior as normal. Association rule algorithms are used to detect relationships between event types.  Association rules can be used to build a rule set that describes the behavior of data within a level of confidence. Such information can be obtained from log files. Association rule algorithms for example provide the rule “ if events of type A and B occur within 5 seconds, they will be followed by an event of type C within 60 seconds”  Provides an algorithm implementation of profiling log data for forensic purposes.

9 Association Mining and Profiling  (R. Vaarandi 2003)  Proposes a data profiling association mining algorithm based on the concept of hierarchies.  In this concept, rules are generated based on a set of parent child relations in a data file with some level of abstraction.  Concept hierarchies based on ones knowledge of the data set can be used to create the rules.  A pre conceived set of belief about the data being investigated can also be used to create a separate data collection.

10 DFD for Profiling Process (R. Vaarandi 2003) Log File Formatted Log File Concepts Profile Beliefs Rule Mining Intra Profile Filtering Data to Profile Output Profiling DataEvent Logs Preprocessing

11 Profile generation algorithm  (R. Vaarandi 2003)  Background knowledge is vital in applying this algorithm and influences the outcome. There are three possible scenarios for generating rules.  no concept of hierarchies and beliefs – produces large rule set requiring extensive user analysis  concept hierarchies but no beliefs – Produces high level rules and generalization of lower ones allowing drill down.  Concept of hierarchies and beliefs – allows for above scenario and filtering based on beliefs.

12 Profile generation algorithm  (R. Vaarandi 2003)  Devises an algorithm called matrix to item set concepts which is in turn based on classic apriori association mining algorithm.  Generated profiles are analyzed using the following functionalities.  Filtering: - guided by previously defined set of beliefs about expected behavior, profile is reduced to subsets of higher interest.  Contrasting raw data to profile: - Produces a list of data that deviates from profile.  Intra profile contrasts: - Aims to find rules in a profile that are in contradiction with rules in the same profile. May indicate shift in behavior.

13 My Reflections  (R. Vaarandi 2003) gives a good framework for applying association mining to build profiles based on event log data. However the investigation knowledge relies heavily on expert knowledge.  More research into how sequential and process mining techniques could be used with this tool to build profiles in needed.

14 Clustering  (R. Vaarandi 2003)  Clustering is used to group objects into similar clusters based on some patterns. These techniques can be used detect anomalies by creating clusters of anomalies.  Clustering can be used to create system profiles so that anomalies in a process can be detected.  Clustering techniques divides a data set into groups each having similar characteristics. This can be used as a precursor to association rule mining to detect relationships between event types.  In addition a clearly identified line pattern can be included in the final profile of the system.

15 Clustering – What algorithm?  (R. Vaarandi 2003)  There exists many clustering algorithms, however attention needs to be paid to clustering algorithms that can mine line patterns in an event log.  Traditional clustering algorithms do not perform well when applied to high dimensional data, such as log file data. There are often cases where “for every pair of points there exist dimensions where these points are far apart from each other, which,makes the detection of any clusters almost impossible”.  Most clustering algorithms have been developed for generic market-place like data and are not suitable for event log data.

16  Proposes an algorithm consisting of three steps, first a data summary is built, then cluster candidates and finally clusters from the candidates.

17 Reference List 1. R. Vaarandi 2003, A Data Clustering Algorithm for Mining patterns From Event Logs 2. R. Vaarandi 2003, A Data Clustering Algorithm for Mining Patterns From Event Logs 3. Book - Tan, Steinbach, Kumar, Introduction to Data Mining


Download ppt "Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining."

Similar presentations


Ads by Google