Data Mining: A Closer Look Chapter 2. 2.1 Data Mining Strategies.

Slides:



Advertisements
Similar presentations
An overview of The IBM Intelligent Miner for Data By: Neeraja Rudrabhatla 11/04/1999.
Advertisements

Basic Data Mining Techniques Chapter Decision Trees.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Learning From Data Chichang Jou Tamkang University.
Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5.
Basic Data Mining Techniques
Neural Networks Chapter Feed-Forward Neural Networks.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Data Mining: A Closer Look Chapter Data Mining Strategies.
StatMaster – An Update Kartik Vishwanath Chintan Patel Yugyung Lee UMKC William Drake Richard Stroup Steve Simon Childrens Mercy Hospital, Kansas City,
1 An Excel-based Data Mining Tool Chapter The iData Analyzer.
Data Mining – Intro.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Evaluating Performance for Data Mining Techniques
1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Basic Data Mining Techniques
Dr. Awad Khalil Computer Science Department AUC
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Inductive learning Simplest form: learn a function from examples
Introduction To Data Mining. What Is Data Mining? A toolA tool Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Chapter 9 Neural Network.
Classification and Prediction (cont.) Pertemuan 10 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Chapter 12 Discovering New Knowledge – Data Mining.
Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 4 An Excel-based Data Mining Tool (iData Analyzer) Jason C. H. Chen, Ph.D. Professor of MIS.
DM.Lab in University of Seoul Data Mining Laboratory April 24 th, 2008 Summarized by Sungjick Lee An Excel-Based Data Mining Tool iData Analyzer.
Data Mining and Decision Support
Data Mining Copyright KEYSOFT Solutions.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Introduction Exploring Categorical Variables Exploring Numerical Variables Exploring Categorical/Numerical Variables Selecting Interesting Subsets of Data.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
CSE 4705 Artificial Intelligence
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
DATA MINING © Prentice Hall.
CSE 4705 Artificial Intelligence
Data Mining 101 with Scikit-Learn
An Excel-based Data Mining Tool
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Intro to Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

Data Mining: A Closer Look Chapter 2

2.1 Data Mining Strategies

Figure 2.1 A hierarchy of data mining strategies

Classification Learning is supervised. The dependent variable is categorical. Well-defined classes. Current rather than future behavior.

Estimation Learning is supervised. The dependent variable is numeric. Well-defined classes. Current rather than future behavior.

Prediction The emphasis is on predicting future rather than current outcomes. The output attribute may be categorical or numeric.

The Cardiology Patient Dataset

A Healthy Class Rule for the Cardiology Patient Dataset IF 169 <= Maximum Heart Rate <=202 THEN Concept Class = Healthy Rule accuracy: 85.07% Rule coverage: 34.55%

A Sick Class Rule for the Cardiology Patient Dataset IF Thal = Rev & Chest Pain Type = Asymptomatic THEN Concept Class = Sick Rule accuracy: 91.14% Rule coverage: 52.17%

Unsupervised Clustering Determine if concepts can be found in the data. Evaluate the likely performance of a supervised model. Determine a best set of input attributes for supervised learning. Detect Outliers.

Market Basket Analysis Find interesting relationships among retail products. Uses association rule algorithms.

2.2 Supervised Data Mining Techniques

The Credit Card Promotion Database

A Hypothesis for the Credit Card Promotion Database A combination of one or more of the dataset attributes differentiate Acme Credit Card Company card holders who have taken advantage of the life insurance promotion and those card holders who have chosen not to participate in the promotional offer.

A Production Rule for the Credit Card Promotion Database IF Sex = Female & 19 <=Age <= 43 THEN Life Insurance Promotion = Yes Rule Accuracy: % Rule Coverage: 66.67%

Production Rules Rule accuracy is a between-class measure. Rule coverage is a within-class measure.

Neural Networks

Figure 2.2 A multilayer fully connected neural network

Statistical Regression Life insurance promotion = (credit card insurance) (sex)

2.3 Association Rules

An Association Rule for the Credit Card Promotion Database IF Sex = Female & Age = over40 & Credit Card Insurance = No THEN Life Insurance Promotion = Yes

2.4 Clustering Techniques

Figure 2.3 An unsupervised cluster of the credit card database

2.5 Evaluating Performance

Evaluating Supervised Learner Models

Confusion Matrix A matrix used to summarize the results of a supervised classification. Entries along the main diagonal are correct classifications. Entries other than those on the main diagonal are classification errors.

Two-Class Error Analysis

Evaluating Numeric Output Mean absolute error Mean squared error Root mean squared error

Comparing Models by Measuring Lift

Figure 2.4 Targeted vs. mass mailing

Computing Lift

Unsupervised Model Evaluation