An Excel-based Data Mining Tool Chapter 4. 4.1 The iData Analyzer.

Slides:



Advertisements
Similar presentations
Statistical Issues in Research Planning and Evaluation
Advertisements

EBI Statistics 101.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Measurement in Survey Research Developing Questionnaire Items with Respect to Content and Analysis.
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Data Mining: A Closer Look Chapter Data Mining Strategies.
Chapter 9 Business Intelligence Systems
Clustering (slide from Han and Kamber)
Decision Tree Algorithm
Basic Data Mining Techniques Chapter Decision Trees.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5.
Basic Data Mining Techniques
Neural Networks Chapter Feed-Forward Neural Networks.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult.
1 An Excel-based Data Mining Tool Chapter The iData Analyzer.
Part I: Classification and Bayesian Learning
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Evaluating Performance for Data Mining Techniques
1. Homework #2 2. Inferential Statistics 3. Review for Exam.
1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.
Accuracy Assessment. 2 Because it is not practical to test every pixel in the classification image, a representative sample of reference points in the.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Basic Data Mining Techniques
Computer Literacy BASICS
©2013 Cengage Learning. All Rights Reserved. Business Management, 13e Data Analysis and Decision Making Mathematics and Management Basic.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
1. Homework #2 2. Inferential Statistics 3. Review for Exam.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Displaying and Exploring Data Unit 1: One Variable Statistics CCSS: N-Q (1-3);
Inductive learning Simplest form: learn a function from examples
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
Chapter 9 Neural Network.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
XLMiner – a Data Mining Toolkit QuantLink Solutions Pvt. Ltd.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 4 An Excel-based Data Mining Tool (iData Analyzer) Jason C. H. Chen, Ph.D. Professor of MIS.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
DM.Lab in University of Seoul Data Mining Laboratory April 24 th, 2008 Summarized by Sungjick Lee An Excel-Based Data Mining Tool iData Analyzer.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
XLMiner – a Data Mining Toolkit
Data Mining: Concepts and Techniques
An Excel-based Data Mining Tool
Outlier Discovery/Anomaly Detection
Prepared by: Mahmoud Rafeek Al-Farra
CHAPTER 3 Describing Relationships
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Presentation transcript:

An Excel-based Data Mining Tool Chapter 4

4.1 The iData Analyzer

Figure 4.1 The iDA system architecture

Figure 4.2 A successful installation

4.2 ESX: A Multipurpose Tool for Data Mining

Figure 4.3 An ESX concept hierarchy

4.3 iDAV Format for Data Mining

4.4 A Five-step Approach for Unsupervised Clustering Step 1: Enter the Data to be Mined Step 2: Perform a Data Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Individual Class Results Step 5: Visualize Individual Class Rules

Step 1: Enter The Data To Be Mined

Figure 4.4 The Credit Card Promotion Database

Step 2: Perform A Data Mining Session

Figure 4.5 Unsupervised settings for ESX

Figure 4.6 RuleMaker options

Step 3: Read and Interpret Summary Results Class Resemblance Scores Domain Resemblance Score Domain Predictability

Summary Results Class Resemblance Score offers a first indication about how well the instances within each class (cluster) fit together. Domain Resemblance Score represents the overall similarity of all instances within the data set. It is highly desirable that class resemblance scores are higher that the domain resemblance score

Summary Results Given categorical attribute A with values v 1, v 2, v 3, …, v i,… v n, the Domain Predictability of v i tells us the domain instances showing v i as a value for A. A predictability score near 100% for a domain-level categorical attribute value indicates that the attribute is not likely to be useful for supervised learning or unsupervised clustering

Summary Results Given categorical attribute A with values v 1, v 2, v 3, …, v i,… v n, the Class C Predictability score for v i tells us the percent of instances within class C shoving v i as a value for A. Given class C and categorical attribute A with values v 1, v 2, v 3, …, v i,… v n, an Attribute-Value Predictiveness score for v i is defined as the probability an instance resides in C given the instance has value v i for A.

Domain Statistics for Numerical Attributes Attribute Significance Value measures the predictive value of each numerical attribute. To calculate the Attribute Significance Value for a numeric attribute, it is necessary to: a) subtract the smallest class mean from the largest mean value; b) divide this result by the domain standard deviation

Figure 4.8 Summery statistics for the Acme credit card promotion database

Figure 4.9 Statistics for numerical attributes and common categorical attribute values

Step 4: Read and Interpret Individual Class Results Class Predictability is a within-class measure. Class Predictiveness is a between- class measure.

Necessary and Sufficient Attribute Values If an attribute value has a predictability and predictiveness score of 1.0, the attribute value is said to be necessary and sufficient for membership in class C. That is, all instances within class C have the specified value for the attribute and all instances with this value for the attribute reside in class C.

Sufficient Attribute Values If an attribute value has a predictiveness score of 1.0 and a predictability score less than 1.0, the attribute value is said to be sufficient but not necessary for membership in class C. That is, all instances with the value for the attribute reside in C, but there are other instances in C that have a different value for this attribute.

Necessary Attribute Values If an attribute value has a predictability score of 1.0 and a predictiveness score less than 1.0, the attribute value is said to be necessary but not sufficient for membership in class C. That is, all instances in C have the same value for the attribute, but there are other instances outside C, have the same value for this attribute.

Necessary and Sufficient Attribute Values in iDA The attribute values with predictiveness scores greater than or equal to 0.8 are considered as highly sufficient. The attribute values with predictability scores greater than or equal to 0.8 are considered as necessary.

Figure 4.10 Class 3 summary results

Figure 4.11 Necessary and sufficient attribute values for Class 3

Step 5: Visualize Individual Class Rules

Figure 4.7 Rules for the credit card promotion database

Rule Interpretation in iDA Each rule simply declares the precondition(s) necessary for an instance to be covered by the rule: if [(condition & condition &…& condition)=true] then an instance resides in a certain class.

Rule Interpretation in iDA Rule accuracy tells us the rule is accurate in …% of all cases where it applies. Rule coverage shows that the rule applies that the rule applies to …% of class instances

4.5 A Six-Step Approach for Supervised Learning Step 1: Choose an Output Attribute Step 2: Perform the Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Test Set Results Step 5: Read and Interpret Class Results Step 6: Visualize and Interpret Class Rules

Figure 4.12 Test set instance classification Read and Interpret Test Set Results

4.6 Techniques for Generating Rules 1.Define the scope of the rules. 2.Choose the instances. 3.Set the minimum rule correctness. 4.Define the minimum rule coverage. 5.Choose an attribute significance value.

4.7 Instance Typicality

Typicality Scores Identify prototypical and outlier instances. Select a best set of training instances. Used to compute individual instance classification confidence scores.

Figure 4.13 Instance typicality

4.8 Special Considerations and Features Avoid Mining Delays The Quick Mine Feature Erroneous and Missing Data