Chapter 5 Data mining : A Closer Look.

Slides:



Advertisements
Similar presentations
Decision Tree Approach in Data Mining
Advertisements

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Chapter 17 Overview of Multivariate Analysis Methods
1. Abstract 2 Introduction Related Work Conclusion References.
Data Mining: A Closer Look Chapter Data Mining Strategies.
Chapter 9 Business Intelligence Systems
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Data Mining Techniques Outline
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Neural Networks Chapter Feed-Forward Neural Networks.
Data Mining By Archana Ketkar.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Decision Tree Models in Data Mining
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Evaluating Performance for Data Mining Techniques
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Dr. Awad Khalil Computer Science Department AUC
Data Mining Techniques
Overview DM for Business Intelligence.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Data Mining Chun-Hung Chou
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Evaluation – next steps
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
© Negnevitsky, Pearson Education, Will neural network work for my problem? Will neural network work for my problem? Character recognition neural.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Chapter 9 Neural Network.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining and Decision Support
Data Mining Copyright KEYSOFT Solutions.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
CSE 4705 Artificial Intelligence
Data Mining Functionalities
Chapter 7. Classification and Prediction
DATA MINING © Prentice Hall.
Data Mining 101 with Scikit-Learn
Adrian Tuhtan CS157A Section1
Data Mining Lecture 11.
Presentation transcript:

Chapter 5 Data mining : A Closer Look

Chapter Objectives Determine an appropriate data mining strategy for a specific problem. Know about several data mining techniques and how each technique builds a generalized model to represent data. Understand how a confusion matrix is used to help evaluate supervised learner models. Chapter 5

Chapter Objectives Understand basic techniques for evaluating supervised learner models with numeric output. Know how measuring lift can be used to compare the performance of several competing supervised learner models. Understand basic techniques for evaluating unsupervised learner models. Chapter 5

Data Mining Strategies Classification is probably the best understood of all data mining strategies. Classification tasks have three common characteristics. Learning is supervised. The dependent variable is categorical. The emphasis is on building models able to assign new instances to one of a set of well-defined classes. Chapter 5

Data Mining Strategies Some example classification tasks include the following: Determine those characteristics that differentiate individuals who have suffered a heart attack from those who have not. Develop a profile of a “successful” person. Determine if a credit card purchase is fraudulent. Classify a car loan applicant as a good or a poor credit risk. Develop a profile to differentiate female and male stroke victims. Chapter 5

Data Mining Strategies Chapter 5

Data Mining Strategies Chapter 5

Data Mining Strategies Chapter 5

Data Mining Strategies Chapter 5

Data Mining Strategies 34% are healthy within these max heart rate range Chapter 5

Supervised Data Mining Techniques Chapter 5

Supervised Data Mining Techniques Chapter 5

Supervised Data Mining Techniques Chapter 5

Supervised Data Mining Techniques Chapter 5

Supervised Data Mining Techniques Chapter 5

Association Rules Chapter 5

Clustering Techniques Chapter 5

Clustering Techniques Chapter 5

Evaluating Performance Chapter 5

Evaluating Performance Chapter 5

Evaluating Performance Chapter 5

Evaluating Performance Chapter 5

Evaluating Performance Chapter 5

Chapter Summary Data mining strategies include classification, estimation, prediction, unsupervised clustering, and market basket analysis. Classification and estimation strategies are similar in that each strategy is employed to build models able to generalize current outcome. However, the output of a classification strategy is categorical, whereas the output of an estimation strategy is numeric. Chapter 5

Chapter Summary A predictive strategy differs from a classification or estimation strategy in that it is used to design models for predicting future outcome rather than current behavior. Unsupervised clustering strategies are employed to discover hidden concept structures in data as well as to locate atypical data instances. The purpose of market basket analysis is to find interesting relationships among retail products. Discovered relationships can be used to design promotions, arrange shelf or catalog items, or develop cross-marketing strategies. Chapter 5

Chapter Summary A data mining technique applies a data mining strategy to a set of data. Data mining techniques are defined by an algorithm and a knowledge structure. Common features that distinguish the various techniques are whether learning is supervised or unsupervised and whether their output is categorical or numeric. Chapter 5

Chapter Summary Familiar supervised data mining techniques include decision tree methods, production rule generators, neural networks, and statistical methods. Association rules are a favorite technique for marketing applications. Clustering techniques employ some measure of similarity to group instances into disjoint partitions. Clustering methods are frequently used to help determine a best set of input attributes for building supervised learner models. Chapter 5

Chapter Summary Performance evaluation is probably the most critical of all the steps in the data mining process. Supervised model evaluation is often performed using a training/test set scenario. Supervised models with numeric output can be evaluated by computing average absolute or average squared error differences between computed and desired outcome. Chapter 5

Chapter Summary Marketing applications that focus on mass mailings are interested in developing models for increasing response rates to promotions. A marketing application measures the goodness of a model by its ability to lift response rate thresholds to levels well above those achieved by naïve (mass) mailing strategies. Unsupervised models support some measure of cluster quality that can be used for evaluative purposes. Supervised learning can also be employed to evaluate the quality of the clusters formed by an unsupervised model. Chapter 5

Key Terms Association rule. A production rule whose consequent may contain multiple conditions and attribute relationships. An output attribute in one association rule can be an input attribute in other rule. Classification. A supervised learning strategy where the output attribute is categorical. The emphasis is on building models able to assign new instances to one of a set of well-defined classes. Confusion matrix. A matrix used to summarize the results of a supervised classification. Entries along the main diagonal represent the total number of correct classifications. Entries other than those on the main diagonal represent classification errors. Chapter 5

Key Terms Data mining strategy. An outline of an approach for problem solution. Data mining technique. One or more algorithms together with an associated knowledge structure. Dependent variable. A variable whose value is determined by a combination of one or more independent variables. Estimation. A supervised learning strategy where the output attribute is numeric. Emphasis is on determining current rather than future outcome. Chapter 5

Key Terms Independent variable. An input attribute used for building supervised or unsupervised learner models. Lift. The probability of class Ci given a sample taken from population P divided by the probability of Ci given the entire population P. Lift chart. A graph that displays the performance of a data mining model as a function of sample size. Linear regression. A supervised learning technique that generalizes numeric data as a linear equation. The equation defines the value of an output attribute as a linear sum of weighted input attribute values. Chapter 5

Key Terms Market basket analysis. A data mining strategy that attempts to find interesting relationships among retail products. Mean absolute error. For a set of training or test set instances, the mean absolute error is the average absolute difference between classifier predicted output and actual output. Mean squared error. For a set of training or test set instances, the mean squared error is the average of the sum of squared differences between classifier predicted output and actual output. Neural network. A set of interconnected nodes designed to imitate the functioning of the human brain. Chapter 5

Key Terms Outliers. Atypical data instances. Prediction. A supervised learning strategy designed to determine future outcome. Root mean squared error. The square root of the mean squared error. Rule Maker. A supervised learner model for generating production rules from data. Statistical regression. A supervised learning technique that generalizes numerical data as a mathematical equation. The equation defines the value of an output attribute as a sum of weighted input attribute values. Chapter 5

THE END Chapter 5