Data Mining CMPT 455/826 - Week 10, Day 2 Jan-Apr 2009 – w10d21.

Slides:



Advertisements
Similar presentations
Overview of IS Controls, Auditing, and Security Fall 2005.
Advertisements

MIS 2000 Class 20 System Development Process Updated 2014.
Chapter 4 Quality Assurance in Context
Continuous Value Enhancement Process
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Introduction to Research Methodology
 delivers evidence that a solution developed achieves the purpose for which it was designed.  The purpose of evaluation is to demonstrate the utility,
1 Chapter 34 Data Mining Transparencies © Pearson Education Limited 1995, 2005.
System Design and Analysis
Manajemen Basis Data Pertemuan 8 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Data Mining.
IT Planning.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Fundamentals of Information Systems, Second Edition
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
Amirkabir University of Technology, Computer Engineering Faculty, Intelligent Systems Laboratory,Requirements Engineering Course, Dr. Abdollahzadeh 1 Goal.
Sabine Mendes Lima Moura Issues in Research Methodology PUC – November 2014.
IIBA Denver | may 20, 2015 | Kym Byron , MBA, CBAP, PMP, CSM, CSPO
Data Mining – Intro.
Web 2.0 Testing and Marketing E-engagement capacity enhancement for NGOs HKU ExCEL3.
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 11 System Test Design
Business Intelligence
Chapter 35 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
Cost Analysis and Classification Systems
Formulating objectives, general and specific
Dr. Awad Khalil Computer Science Department AUC
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
SYSTEM ANALYSIS AND DESIGN
Introduction to Systems Analysis and Design Trisha Cummings.
More on Data Mining KDnuggets Datanami ACM SIGKDD
Understanding Data Analytics and Data Mining Introduction.
Introduction: The essential background
1.Knowledge management 2.Online analytical processing 3. 4.Supply chain management 5.Data mining Which of the following is not a major application.
Software Project Management Lecture # 8. Outline Chapter 25 – Risk Management  What is Risk Management  Risk Management Strategies  Software Risks.
RESEARCH A systematic quest for undiscovered truth A way of thinking
SOFTWARE ENGINEERING BIT-8 APRIL, 16,2008 Introduction to UML.
Software Project Management Lecture # 8. Outline Earned Value Analysis (Chapter 24) Topics from Chapter 25.
Sept - Dec w1d11 Beyond Accuracy: What Data Quality Means to Data Consumers CMPT 455/826 - Week 1, Day 1 (based on R.Y. Wang & D.M. Strong)
Requirements Engineering
Classroom Assessments Checklists, Rating Scales, and Rubrics
1 Chapter 5 Software Engineering Practice. 2 What is “Practice”? Practice is a broad array of concepts, principles, methods, and tools that you must consider.
Using Information Technology Chapter 11 Information Systems.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
ESSnet on microdata linking and data warehousing in statistical production: Metadata Quality in the Statistical Data Warehouse.
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Fundamentals of Information Systems, Second Edition 1 Systems Development.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
The Software Development Process
Assessment and Testing
Decision Mining in Prom A. Rozinat and W.M.P. van der Aalst Joosung, Ko.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
 An Information System (IS) is a collection of interrelated components that collect, process, store, and provide as output the information needed to.
Software Engineering INTRODUCTION TO SOFTWARE DEVELOPMENT.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Information Tasks CMPT 455/826 - Week 4, Day 2 (Various sources) Sept-Dec 2009 – w4d21.
Techniques In Information Systems Development Methodology.
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
LECTURE 5 Nangwonvuma M/ Byansi D. Components, interfaces and integration Infrastructure, Middleware and Platforms Techniques – Data warehouses, extending.
Laurea Triennale in Informatica – Corso di Ingegneria del Software I – A.A. 2006/2007 Andrea Polini XVII. Verification and Validation.
Introduction to Machine Learning, its potential usage in network area,
Classroom Assessments Checklists, Rating Scales, and Rubrics
Data Mining – Intro.
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Learner Characteristic and ICT in the Classroom
RESEARCH BASICS What is research?.
Presentation transcript:

Data Mining CMPT 455/826 - Week 10, Day 2 Jan-Apr 2009 – w10d21

A Methodology for Evaluating and Selecting Data Mining Software (based on Collier) Jan-Apr 2009 – w10d22

Evaluation Categories Performance is the ability to handle a variety of data sources in an efficient manner Functionality is the inclusion of a variety of capabilities, techniques, and methodologies for data mining Usability is accommodation of different levels and types of users without loss of functionality or usefulness Ancillary Task Support allows the user to perform the variety of data cleansing, manipulation, transformation, visualization and other tasks that support data mining Jan-Apr 2009 – w10d23

Methodology 1.Tool pre-screening 2.Identify Additional Selection Criteria 3.Weight Selection Criteria 4.Tool Scoring 5.Score Evaluation 6.Tool Selection Jan-Apr 2009 – w10d24

Introduction to Data Mining from Discovering Knowledge in Data: An Introduction to Data Mining by Daniel T. Larose Jan-Apr 2009 – w10d25

Need For Human Direction Of Data Mining Many software vendors market their analytical software as being plug-and-play out of-the-box applications that will provide solutions to otherwise intractable problems without the need for human supervision or interaction. Some early definitions of data mining followed this focus on automation. Humans need to be actively involved at every phase of the data mining process. Rather than asking where humans fit into data mining, we should instead inquire about how we may design data mining into the very human process of problem solving. Jan-Apr 2009 – w10d26

CRISP–DM: The Six Phases 1.Business understanding phase 2.Data understanding phase 3.Data preparation phase 4.Modeling phase 5.Evaluation phase 6.Deployment phase Jan-Apr 2009 – w10d27

CRISP–DM: Modeling phase a)Select and apply appropriate modeling techniques b)Calibrate model settings to optimize results c)Remember that often, several different techniques may be used for the same data mining problem d)If necessary, loop back to the data preparation phase to bring the form of the data into line with the specific requirements of a particular data mining technique Jan-Apr 2009 – w10d28

CRISP–DM: Evaluation phase 1.Evaluate the one or more models delivered in the modeling phase –for quality and effectiveness before deploying them for use in the field. 2.Determine whether the model in fact achieves the objectives –set for it in the first phase. 3.Establish if something has not been accounted for sufficiently –some important facet of the business or research problem 4.Come to a decision –regarding use of the data mining results. Jan-Apr 2009 – w10d29

Fallacies Of Data Mining Fallacy 1. There are data mining tools that we can turn loose on our data repositories and use to find answers to our problems. Reality. There are no automatic data mining tools that will solve your problems mechanically "while you wait." Rather, data mining is a process, as we have seen above. CRISP–DM is one method for fitting the data mining process into the overall business or research plan of action. Jan-Apr 2009 – w10d210

Fallacies Of Data Mining Fallacy 2. The data mining process is autonomous, requiring little or no human oversight. Reality. As we saw above, the data mining process requires significant human interactivity at each stage. Even after the model is deployed, the introduction of new data often requires an updating of the model. Continuous quality monitoring and other evaluative measures must be assessed by human analysts. Jan-Apr 2009 – w10d211

Fallacies Of Data Mining Fallacy 3. Data mining pays for itself quite quickly. Reality. The return rates vary, depending on the startup costs, analysis personnel costs, data warehousing preparation costs, and so on. Jan-Apr 2009 – w10d212

Fallacies Of Data Mining Fallacy 4. Data mining software packages are intuitive and easy to use. Reality. Again, ease of use varies. However, data analysts must combine subject matter knowledge with an analytical mind and a familiarity with the overall business or research model. Jan-Apr 2009 – w10d213

Fallacies Of Data Mining Fallacy 5. Data mining will identify the causes of our business or research problems. Reality. The knowledge discovery process will help you to uncover patterns of behavior. Again, it is up to humans to identify the causes. Jan-Apr 2009 – w10d214

Fallacies Of Data Mining Fallacy 6. Data mining will clean up a messy database automatically. Reality. Well, not automatically. As a preliminary phase in the data mining process, data preparation often deals with data that has not been examined or used in years. Therefore, organizations beginning a new data mining operation will often be confronted with the problem of data that has been lying around for years, is stale, and needs considerable updating. Jan-Apr 2009 – w10d215

What Tasks Can Data Mining Accomplish? Description Estimation Prediction Classification Clustering Association Jan-Apr 2009 – w10d216

Interestingness Measures for Data Mining: A Survey (based on Geng and Hamilton) Jan-Apr 2009 – w10d217

Interestingness measures are intended for selecting and ranking patterns –according to their potential interest to the user –regardless of the kind of patterns being mined Interestingness is a broad concept that emphasizes –conciseness, coverage, reliability, peculiarity, diversity, novelty, surprisingness, utility, and actionability These nine criteria can be further categorized into: –objective, subjective, and semantics-based A concise pattern or set of patterns –is relatively easy to understand and remember and thus is added more easily to the user’s knowledge (set of beliefs) Jan-Apr 2009 – w10d218

Towards comprehensive support for organizational mining (based on Song) Jan-Apr 2009 – w10d219

Process mining requires the availability of an event log –Most has been focusing on control-flow discovery i.e., constructing a process model based on an event log –assume that it is possible to sequentially record events such that each event refers to an activity (i.e., a well-defined step in the process) and is related to a particular case (i.e., a process instance) –some mining techniques use additional information such as the performer or originator of the event (i.e., the person/resource executing or initiating the activity), the timestamp of the event, or data elements recorded with the event (e.g., the size of an order) Jan-Apr 2009 – w10d220

Event logs An interesting class of information systems –that produce event logs are –the so-called Process-Aware Information Systems (PAISs) These systems provide very detailed information –about the activities that have been executed Jan-Apr 2009 – w10d221

Organizational mining Discovery –aims at constructing a model that reflects current situations –the organizational model represents the current organizational structure and –the social network shows the communication structure in an organization Jan-Apr 2009 – w10d222

Organizational mining Conformance checking –examines whether the modeled behaviour matches the observed behaviour –involves two dimension of conformance measures in the control flow perspective fitness –is the degree of the association between the log traces and the execution paths specified by the process model appropriateness –is the degree of accuracy with which the process –model describes observed behaviour Jan-Apr 2009 – w10d223

Organizational mining Extension –aims at enriching an existing model by extending the model through the projection of information extracted from the logs onto the initial model Jan-Apr 2009 – w10d224