Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1), 4.4 1.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
IT 433 Data Warehousing and Data Mining
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Classification Techniques: Decision Tree Learning
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Quality Class 9. Rule Discovery Decision and Classification Trees Association Rules.
Spring 2003Data Mining by H. Liu, ASU1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
Chapter 16 Parallel Data Mining 16.1From DB to DW to DM 16.2Data Mining: A Brief Overview 16.3Parallel Association Rules 16.4Parallel Sequential Patterns.
Spring 2005CSE 572, CBS 598 by H. Liu1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
Decision Tree Algorithm
Data Mining Association Analysis: Basic Concepts and Algorithms
Basic Data Mining Techniques Chapter Decision Trees.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Basic Data Mining Techniques
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Decision Trees an Introduction.
Classification.
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Chapter 7 Decision Tree.
Basic Data Mining Techniques
Data Mining: Classification
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
9/03Data Mining – Association G Dong (WSU) 1 5. Association Rules Market Basket Analysis APRIORI Efficient Mining Post-processing.
CS690L Data Mining: Classification
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Data Mining Find information from data data ? information.
Association Rule Mining
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Artificial Intelligence
A Research Oriented Study Report By :- Akash Saxena
Data Science Algorithms: The Basic Methods
Classification and Prediction
Market Basket Analysis and Association Rules
Market Basket Analysis and Association Rules
Presentation transcript:

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1), 4.4 1

Introduction Data mining is a component of a wider process called knowledge discovery from databases. Data mining techniques include:  Classification  Clustering 2

What is Classification? Classification is concerned with generating a description or model for each class from the given dataset of records. Classification can be:  Supervised (Decision Trees and Associations)  Unsupervised (more in next chapter) 3

Supervised Classification  Training set (pre-classified data)  Use the training set, the classifier generates a description / model of the classes which helps to classify unknown records. How can we evaluate how good the classifier is at classifying unknown records? Using a test dataset 4

Decision Trees A decision tree is a tree with the following properties:  An inner node represents an attribute  An edge represents a test on the attribute of the father node  A leaf represents one of the classes Construction of a decision tree:  Based on the training data  Top-Down strategy 5

Decision Trees The set of records available for classification is divided into two disjoint subsets:  a training set  a test set Attributes whose domain is numerical are called numerical attributes Attributes whose domain is not numerical are called categorical attributes. 6

Training dataset 7

Test dataset 8

Decision Tree RULE 1 If it is sunny and the humidity is not above 75%, then play. RULE 2 If it is sunny and the humidity is above 75%, then do not play. RULE 3 If it is overcast, then play. RULE 4 If it is rainy and not windy, then play. RULE 5 If it is rainy and windy, then don't play. Splitting Attribute Splitting Criterion/ condition 9

Confidence  Confidence in the classifier is determined by the percentage of the test data that is correctly classified. Activity:  Compute the confidence in Rule-1  Compute the confidence in Rule-2  Compute the confidence in Rule-3 10

Decision Tree Algorithms  ID3 algorithm  Rough Set Theory 11

Decision Trees  ID3 Iterative Dichotomizer (Quinlan 1986), represents concepts as decision trees.  A decision tree is a classifier in the form of a tree structure where each node is either:  a leaf node, indicating a class of instances OR  a decision node, which specifies a test to be carried out on a single attribute value, with one branch and a sub-tree for each possible outcome of the test 12

Decision Tree development process  Construction phase Initial tree constructed from the training set  Pruning phase Removes some of the nodes and branches to improve performance  Processing phase The pruned tree is further processed to improve understandability 13

Construction phase Use Hunt’s method:  T : Training dataset with class labels { C1, C2,…,Cn}  The tree is built by repeatedly partitioning the training data, based on the goodness of the split.  The process is continued until all the records in a partition belong to the same class. 14

Best Possible Split  Evaluation of splits for each attribute.  Determination of the splitting condition on the selected splitting attribute.  Partitioning the data using best split. The best split : the one that does the best job of separating the records into groups, where a single class predominates. 15

Splitter choice To choose a best splitter,  we consider each attribute in turn.  If an attribute has multiple values, we sort it, measure the goodness, and evaluate each split.  We compare the effectiveness of the split provided by the best splitter from each attribute.  The winner is chosen as the splitter for the root node. 16

Iterative Dichotimizer (ID3)  Uses Entropy : Information theoretic approach to measure the goodness of a split.  The algorithm uses the criterion of information gain to determine the goodness of a split.  The attribute with the greatest information gain is taken as the splitting attribute, and the data set is split for all distinct values of the attribute. 17

Entropy 18

Information measure 19

Example-1 T : Training dataset, C1=40, C2=30, C3=30  Compute Entropy of (T) or Info(T) 20 TC1C2C

Info (X,T) 21 If T is partitioned based on attribute X, into sets T1, T2, …Tn, then the information needed to identify the class of an element of T becomes:

Example-2 If T is divided into 2 subsets S1, S2, with n1, and n2 number of records according to attribute X. If we assume n1=60, and n2=40, the splitting can be given by:  Compute Entropy(X,T) or Info (X,T) after segmentation 22 S1C1C2C S2C1C2C

Information Gain 23

Example-3 Gain (X,T)=Info(T)-Info(X,T) = =

Example-4  Assume we have another splitting on attribute Y:  Info (Y,T)=  Gain= Info(T)-Info(Y,T)= S1C1C2C S1C1C2C3 6020

Splitting attribute X or Y? Gain (X,T) =0.42 Gain (Y, T)= The splitting attribute is chosen to be the one with the largest gain.  X 26

Gain-Ratio 27

Index of Diversity  A high index of diversity  set contains even distribution of classes  A low index of diversity  Members of a single class predominates ) 28

Which is the best splitter? The best splitter is one that decreases the diversity of the record set by the greatest amount. We want to maximize: [Diversity before split- (diversity-left child + diversity right child)] 29

Index of Diversity 30

Numerical Example For the play golf example, compute the following:  Entropy of T.  Information Gain for the following attributes: outlook, humidity, temp, and windy. Based on ID3, which will be selected as the splitting attribute? 31

Association 32

Association Rule Mining  The data mining process of identifying associations from a dataset.  Searches for relationships between items in a dataset.  Also called market-basket analysis Example: 90% of people who purchase bread also purchase butter 33

Why?  Analyze customer buying habits  Helps retailer develop marketing strategies.  Helps inventory management  Sale promotion strategies 34

Basic Concepts  Support  Confidence  Itemset  Strong rules  Frequent Itemset 35

Support IF A  B Support (A  B)= #of tuples containing both (A,B) Total # of tuples 36 The support of an association pattern is the percentage of task-relevant data transactions for which the pattern is true.

Confidence  IF A  B  Confidence (A  B)= #of tuples containing both (A,B) Total # of tuples containing A 37 Confidence is defined as the measure of certainty or trustworthiness associated with each discovered pattern.

Itemset  A set of items is referred to as itemset.  An itemset containing k items is called k itemset.  An itemset can also be seen as a conjunction of items (or a predicate) 38

Frequent Itemset  Suppose min_sup is the minimum support threshold.  An itemset satisfies minimum support if the occurrence frequency of the itemset is greater than or equal to min_sup.  If an itemset satisfies minimum support, then it is a frequent itemset. 39

Strong Rules Rules that satisfy both a minimum support threshold and a minimum confidence threshold are called strong. 40

Association Rules Algorithms that obtain association rules from data usually divide the task into two parts:  Find the frequent itemsets and  Form the rules from them:  Generate strong association rules from the frequent itemsets 41

A priori algorithm  Agrawal and Srikant in 1994  Also called the level-wise algorithm  It is the most accepted algorithm for finding all the frequent sets  It makes use of the downward closure property  The algorithm is a bottom-up search, progressing upward level-wise in the lattice  Before reading the database at every level, it prunes many of the sets, sets which are unlikely to be frequent sets. 42

A priori Algorithm  Uses a Level-wise search, where k - itemsets are used to explore (k+1)itemsets, to mine frequent itemsets from transactional database for Boolean association rules.  First, the set of frequent 1-itemsets is found. This set is denoted L1. L1 is used to find L2, the set of frequent 2-itemsets, which is used to fine L3, and so on, 43

A priori Algorithm steps  The first pass of the algorithm simply counts item occurrences to determine the frequent itemsets.  A subsequent pass, say pass k, consists of two phases:  The frequent itemsets L k-1 found in the (k-1) th pass are used to generate the candidate item sets C k, using the a priori candidate generation function.  the database is scanned and the support of candidates in C k is counted. 44

Join Step  Assume that we know frequent itemsets of size k-1. Considering a k-itemset we can immediately conclude that by dropping two different items we have two frequent (k-1) itemsets.  From another perspective this can be seen as a possible way to construct k- itemsets. We take two (k-1) item sets which differ only by one item and take their union. This step is called the join step and is used to construct POTENTIAL frequent k-itemsets. 45

Join Algorithm 46

Pruning Algorithm 47

Pruning Algorithm Pseudo code 48

49  Tuples represent transactions (15 transactions)  Columns represent items (9 items)  Min-sup = 20%  Itemset should be supported by 3 transactions at least

50

51

52

Example 53 Source: