Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Web Mining.
C6 Databases.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Back to Table of Contents
Accessing Organizational Information—Data Warehouse
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Chapter 12: Web Usage Mining - An introduction
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
Week 9 Data Mining System (Knowledge Data Discovery)
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Neural Technology and Fuzzy Systems in Network Security Project Progress 2 Group 2: Omar Ehtisham Anwar Aneela Laeeq
WM Software Process & Quality Generic Processes - Slide #1  P. Sorenson SPiCE Reference Model - how to read Chapter 5 Capability Levels (process.
Research Methods for Business Students
Neural Technology and Fuzzy Systems in Network Security Project Progress Group 2: Omar Ehtisham Anwar Aneela Laeeq
Overview of Web Data Mining and Applications Part I
Chapter 5 Data mining : A Closer Look.
Intrusion Detection System Marmagna Desai [ 520 Presentation]
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Factors affecting contractors’ risk attitudes in construction projects: Case study from China 박병권.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
Data Mining Techniques
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
BSBIMN501A QUEENSLAND INTERNATIONAL BUSINESS ACADEMY.
9 Closing the Project Teaching Strategies
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Grant Pannell. Intrusion Detection Systems  Attempt to detect unauthorized activity  CIA – Confidentiality, Integrity, Availability  Commonly network-based.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
Assessing the influence on processes when evolving the software architecture By Larsson S, Wall A, Wallin P Parul Patel.
Information commitments, evaluative standards and information searching strategies in web-based learning evnironments Ying-Tien Wu & Chin-Chung Tsai Institute.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Chapter 3: Software Project Management Metrics
Rational Unified Process Fundamentals Module 7: Process for e-Business Development Rational Unified Process Fundamentals Module 7: Process for e-Business.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
ERP and Related Technologies
What is Research?. Intro.  Research- “Any honest attempt to study a problem systematically or to add to man’s knowledge of a problem may be regarded.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Copyright  2007 McGraw-Hill Pty Ltd PPTs t/a Marketing Research 2e by Lukas, Hair, Bush and Ortinau Slides prepared by Judy Rex 19-1 Chapter Nineteen.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Profiling based unstructured process logs
Web Mining Ref:
Sangeeta Devadiga CS 157B, Spring 2007
Boštjan Kožuh Statistical Office of the Republic of Slovenia,
Data Warehousing Data Mining Privacy
CSE591: Data Mining by H. Liu
Presentation transcript:

Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.

Definitions – Data Profiling  The use of analytical techniques about data for the purpose of developing a thorough knowledge of its content, structure and quality. (

Definition 2 – Data Profiling  Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data. The purpose of these statistics may be to:  find out whether existing data can easily be used for other purposes  give metrics on data quality including whether the data conforms to company standards  assess the risk involved in integrating data for new applications, including the challenges of joins  track data quality  assess whether metadata accurately describes the actual values in the source database  understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can incur time delays and project cost overruns.  have an enterprise view of all data, for uses such as Master Data Management where key data is needed, or Data governance for improving data quality (

What could Process profiling be  the practice of tracking information about processes by monitoring their execution. This can be done by analyzing the case perspective, process perspective and resource perspective to assess their behavior, predict certain characteristics and to configure optimum runtime parameters.

Possible Applications  Analyzing rendering behavior. A user could be provided with a set of options that allow one to analyze very specific rendering behavior in parts of a process.  Profiling process outcomes. the use of some techniques to analyze the outcome of processes in order to determine what may be causing the observed behavior.  Event Tracing and Prediction. Based on an events log, real-time event logs can be traced to troubleshoot, determine where performance issues are occurring and predict the likely execution pattern.

Common Approaches  Data mining techniques commonly used in the context of customer personalization. The aim is to market content and services tailored to an individual on the basis of knowledge about their preferences and behaviour. (Tamas Abraham 2006)  Common techniques  Association rule mining  Clustering

Review of Literature Section 2

Association Rule mining  (R. Vaarandi 2003)  Association rules can be used to create a system profile by considering the most frequently occurring behavior as normal. Association rule algorithms are used to detect relationships between event types.  Association rules can be used to build a rule set that describes the behavior of data within a level of confidence. Such information can be obtained from log files. Association rule algorithms for example provide the rule “ if events of type A and B occur within 5 seconds, they will be followed by an event of type C within 60 seconds”  Provides an algorithm implementation of profiling log data for forensic purposes.

Association Mining and Profiling  (R. Vaarandi 2003)  Proposes a data profiling association mining algorithm based on the concept of hierarchies.  In this concept, rules are generated based on a set of parent child relations in a data file with some level of abstraction.  Concept hierarchies based on ones knowledge of the data set can be used to create the rules.  A pre conceived set of belief about the data being investigated can also be used to create a separate data collection.

DFD for Profiling Process (R. Vaarandi 2003) Log File Formatted Log File Concepts Profile Beliefs Rule Mining Intra Profile Filtering Data to Profile Output Profiling DataEvent Logs Preprocessing

Profile generation algorithm  (R. Vaarandi 2003)  Background knowledge is vital in applying this algorithm and influences the outcome. There are three possible scenarios for generating rules.  no concept of hierarchies and beliefs – produces large rule set requiring extensive user analysis  concept hierarchies but no beliefs – Produces high level rules and generalization of lower ones allowing drill down.  Concept of hierarchies and beliefs – allows for above scenario and filtering based on beliefs.

Profile generation algorithm  (R. Vaarandi 2003)  Devises an algorithm called matrix to item set concepts which is in turn based on classic apriori association mining algorithm.  Generated profiles are analyzed using the following functionalities.  Filtering: - guided by previously defined set of beliefs about expected behavior, profile is reduced to subsets of higher interest.  Contrasting raw data to profile: - Produces a list of data that deviates from profile.  Intra profile contrasts: - Aims to find rules in a profile that are in contradiction with rules in the same profile. May indicate shift in behavior.

My Reflections  (R. Vaarandi 2003) gives a good framework for applying association mining to build profiles based on event log data. However the investigation knowledge relies heavily on expert knowledge.  More research into how sequential and process mining techniques could be used with this tool to build profiles in needed.

Clustering  (R. Vaarandi 2003)  Clustering is used to group objects into similar clusters based on some patterns. These techniques can be used detect anomalies by creating clusters of anomalies.  Clustering can be used to create system profiles so that anomalies in a process can be detected.  Clustering techniques divides a data set into groups each having similar characteristics. This can be used as a precursor to association rule mining to detect relationships between event types.  In addition a clearly identified line pattern can be included in the final profile of the system.

Clustering – What algorithm?  (R. Vaarandi 2003)  There exists many clustering algorithms, however attention needs to be paid to clustering algorithms that can mine line patterns in an event log.  Traditional clustering algorithms do not perform well when applied to high dimensional data, such as log file data. There are often cases where “for every pair of points there exist dimensions where these points are far apart from each other, which,makes the detection of any clusters almost impossible”.  Most clustering algorithms have been developed for generic market-place like data and are not suitable for event log data.

Clustering – What algorithm?  Proposes an algorithm consisting of three steps, first a data summary is built, then cluster candidates and finally clusters from the candidates.

Profiling Applications in Web usage mining Section 2

Web Usage Patterns and Profiling  (Brij et al 2002) Present a summary of findings on the Web Mining for Usage Patterns and Profiles  (Chi, Rosien 2002)Web usage mining has been used to enable an understanding of user goals when navigating the web. This is through a method that infers major groupings of web traffic through association rule mining.  (Shah, Joshi, Wurman 2002) use data mining to understand the auction process by exploring common bidding patterns. Through this they propose new bidding engagements and rules for classifying strategies. Furthermore they seek to suggest economic motivations for such behaviour

 (Ypma, and Heskes, 2002) use markov models to model the click streams of web surfers. They use prior knowledge and various markov modeling techniques to obtain web page categorizations based on weblogs  Hay, Wets, Vanhoof, present a new algorithm called Multidimensional Sequence Alignment Method (MDSAM) is illustrated for mining navigation patterns on a web site. MDSAM examines sequences composed of several information types, such as visited pages and visiting time spent on pages. It identifies profiles showing visited pages, visiting time spent on pages and the order in which pages are visited on a website

 (Yang, Parthasarty and Reddy, 2002) provide an approach which is based on association rule mining. Their algorithm discovers association rules that are constrained (and ordered) temporally. This is based on the premise that pages accessed recently have a greater influence on pages that will be accessed in the near future.  (Oyanagi, Kubota, Nakase) explore issues in sequence pattern mining web data. The Apriori algorithm suffers from inherent difficulties in finding long sequential patterns and in finding interesting patterns among a huge amount of results. They propose a new method for finding sequence patterns by matrix clustering.

Applications in Computer Forensics Section 3

Reference List 1. R. Vaarandi 2003, A Data Clustering Algorithm for Mining patterns From Event Logs 2. R. Vaarandi 2003, A Data Clustering Algorithm for Mining Patterns From Event Logs 3. Book - Tan, Steinbach, Kumar, Introduction to Data Mining 4. Brij et al, Web site of the WebKDD 2002 Workshop, 5. Tamas Abraham, 2006 Event Sequence Mining to Develop Profiles for Computer Forensic Investigation Purposes Australian Computer Society