Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.

Slides:

Advertisements

Similar presentations

Advertisements

Multiple Indicator Cluster Surveys Data Interpretation, Further Analysis and Dissemination Workshop Basic Concepts of Further Analysis.

Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,

Data Mining Sangeeta Devadiga CS 157B, Spring 2007.

Chapter 12: Web Usage Mining - An introduction

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Week 9 Data Mining System (Knowledge Data Discovery)

LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.

Neural Technology and Fuzzy Systems in Network Security Project Progress 2 Group 2: Omar Ehtisham Anwar Aneela Laeeq

Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.

WM Software Process & Quality Generic Processes - Slide #1  P. Sorenson SPiCE Reference Model - how to read Chapter 5 Capability Levels (process.

Neural Technology and Fuzzy Systems in Network Security Project Progress Group 2: Omar Ehtisham Anwar Aneela Laeeq

Intrusion Detection System Marmagna Desai [ 520 Presentation]

CSCI 347 / CS 4206: Data Mining Module 01: Introduction Topic 03: Stages in Data Mining.

Data Mining Techniques

FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.

Understanding Data Analytics and Data Mining Introduction.

Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.

IMSS005 Computer Science Seminar

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.

An Introduction to Software Architecture

Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.

Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.

C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.

Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.

1 Chapter 3 1.Quality Management, 2.Software Cost Estimation 3.Process Improvement.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.

Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

ERP and Related Technologies

Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.

Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.

DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.

1 Team Skill 3 Defining the System Part 1: Use Case Modeling Noureddine Abbadeni Al-Ain University of Science and Technology College of Engineering and.

SOFTWARE TESTING TRAINING TOOLS SUPPORT FOR SOFTWARE TESTING Chapter 6 immaculateres 1.

Market Basket Analysis

Transaction Processing System (TPS)

Viewing Data-Driven Success Through a Capability Lens

Automate Does Not Always Mean Optimize

Introduction: The Nature of Leadership

Profiling based unstructured process logs

DATA MINING © Prentice Hall.

System Design and Modeling

Literature review Dr.Rehab F Gwada.

Data Warehouse.

9. Introduction to signal detection

MGT 498 Education for Service-- snaptutorial.com.

MGT 498 Teaching Effectively-- snaptutorial.com

Transaction Processing System (TPS)

Sangeeta Devadiga CS 157B, Spring 2007

Outlier Discovery/Anomaly Detection

Analyzing Reliability and Validity in Outcomes Assessment Part 1

Data Quality By Suparna Kansakar.

Data Mining: Exploring Data

Quantitative and Qualitative Approaches Dr. William M. Bauer

Transaction Processing System (TPS)

An Introduction to Software Architecture

Data Warehousing Data Mining Privacy

Analyzing Reliability and Validity in Outcomes Assessment

M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University

Task Two: Selecting and Defining a Research Topic

Data Pre-processing Lecture Notes for Chapter 2

CSE591: Data Mining by H. Liu

A SIEM for the Forensic Analysis of Database Management System Logs

Presentation transcript:

Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.

Definitions – Data Profiling  The use of analytical techniques about data for the purpose of developing a thorough knowledge of its content, structure and quality.(

Definition 2 – Data Profiling  Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data. The purpose of these statistics may be to:  find out whether existing data can easily be used for other purposes  give metrics on data quality including whether the data conforms to company standards  assess the risk involved in integrating data for new applications, including the challenges of joins  track data quality  assess whether metadata accurately describes the actual values in the source database  understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can incur time delays and project cost overruns.  have an enterprise view of all data, for uses such as Master Data Management where key data is needed, or Data governance for improving data quality (

What could Process profiling be  the practice of tracking information about processes by monitoring their execution. This can be done by analyzing the case perspective, process perspective and resource perspective to assess their behavior, predict certain characteristics and to configure optimum runtime parameters.

Possible Applications  Analyzing rendering behavior. A user could be provided with a set of options that allow one to analyze very specific rendering behavior in parts of a process.  Profiling process outcomes. the use of some techniques to analyze the outcome of processes in order to determine what may be causing the observed behavior.  Event Tracing and Prediction. Based on an events log, real-time event logs can be traced to troubleshoot, determine where performance issues are occurring and predict the likely execution pattern.

Common Approaches  Data mining techniques commonly used:  Association rule mining  Clustering

Review of Literature Section 2

Association Rule mining  (R. Vaarandi 2003)  Association rules can be used to create a system profile by considering the most frequently occurring behavior as normal. Association rule algorithms are used to detect relationships between event types.  Association rules can be used to build a rule set that describes the behavior of data within a level of confidence. Such information can be obtained from log files. Association rule algorithms for example provide the rule “ if events of type A and B occur within 5 seconds, they will be followed by an event of type C within 60 seconds”  Provides an algorithm implementation of profiling log data for forensic purposes.

Association Mining and Profiling  (R. Vaarandi 2003)  Proposes a data profiling association mining algorithm based on the concept of hierarchies.  In this concept, rules are generated based on a set of parent child relations in a data file with some level of abstraction.  Concept hierarchies based on ones knowledge of the data set can be used to create the rules.  A pre conceived set of belief about the data being investigated can also be used to create a separate data collection.

DFD for Profiling Process (R. Vaarandi 2003) Log File Formatted Log File Concepts Profile Beliefs Rule Mining Intra Profile Filtering Data to Profile Output Profiling DataEvent Logs Preprocessing

Profile generation algorithm  (R. Vaarandi 2003)  Background knowledge is vital in applying this algorithm and influences the outcome. There are three possible scenarios for generating rules.  no concept of hierarchies and beliefs – produces large rule set requiring extensive user analysis  concept hierarchies but no beliefs – Produces high level rules and generalization of lower ones allowing drill down.  Concept of hierarchies and beliefs – allows for above scenario and filtering based on beliefs.

Profile generation algorithm  (R. Vaarandi 2003)  Devises an algorithm called matrix to item set concepts which is in turn based on classic apriori association mining algorithm.  Generated profiles are analyzed using the following functionalities.  Filtering: - guided by previously defined set of beliefs about expected behavior, profile is reduced to subsets of higher interest.  Contrasting raw data to profile: - Produces a list of data that deviates from profile.  Intra profile contrasts: - Aims to find rules in a profile that are in contradiction with rules in the same profile. May indicate shift in behavior.

My Reflections  (R. Vaarandi 2003) gives a good framework for applying association mining to build profiles based on event log data. However the investigation knowledge relies heavily on expert knowledge.  More research into how sequential and process mining techniques could be used with this tool to build profiles in needed.

Clustering  (R. Vaarandi 2003)  Clustering is used to group objects into similar clusters based on some patterns. These techniques can be used detect anomalies by creating clusters of anomalies.  Clustering can be used to create system profiles so that anomalies in a process can be detected.  Clustering techniques divides a data set into groups each having similar characteristics. This can be used as a precursor to association rule mining to detect relationships between event types.  In addition a clearly identified line pattern can be included in the final profile of the system.

Clustering – What algorithm?  (R. Vaarandi 2003)  There exists many clustering algorithms, however attention needs to be paid to clustering algorithms that can mine line patterns in an event log.  Traditional clustering algorithms do not perform well when applied to high dimensional data, such as log file data. There are often cases where “for every pair of points there exist dimensions where these points are far apart from each other, which,makes the detection of any clusters almost impossible”.  Most clustering algorithms have been developed for generic market-place like data and are not suitable for event log data.

 Proposes an algorithm consisting of three steps, first a data summary is built, then cluster candidates and finally clusters from the candidates.

Reference List 1. R. Vaarandi 2003, A Data Clustering Algorithm for Mining patterns From Event Logs 2. R. Vaarandi 2003, A Data Clustering Algorithm for Mining Patterns From Event Logs 3. Book - Tan, Steinbach, Kumar, Introduction to Data Mining