The Three Analytics Techniques. Decision Trees – Determining Probability.

Slides:



Advertisements
Similar presentations
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Advertisements

Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or causal structures.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Chi square.  Non-parametric test that’s useful when your sample violates the assumptions about normality required by other tests ◦ All other tests we’ve.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Final Exam Review. Data Mining and Data Analytics Techniques Explain the three data analytics techniques we covered in the course Decision Trees, Clustering,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
Data Mining, Frequent-Itemset Mining. Data Mining Some mining problems Find frequent itemsets in "market-basket" data – "50% of the people who buy hot.
Decision Tree Models in Data Mining
CS 349: Market Basket Data Mining All about beer and diapers.
Basic Statistics. Basics Of Measurement Sampling Distribution of the Mean: The set of all possible means of samples of a given size taken from a population.
SAS Homework 3 Review Association rules mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Exam 3 Sample Decision Trees Cluster Analysis Association Rules Data Visualization SAS.
SAS Homework 4 Review Clustering and Segmentation
Exam 3 Review Decision Trees Cluster Analysis Association Rules Data Visualization SAS.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Dan Piett STAT West Virginia University Lecture 12.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Elective-I Examination Scheme- In semester Assessment: 30 End semester Assessment :70 Text Books: Data Mining Concepts and Techniques- Micheline Kamber.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chi-square test Hypothesis testing. Null hypothesis- The status quo– AKA no change/difference. Alternative hypothesis- What you want to prove/show with.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
MIS2502: Data Analytics Association Rule Mining David Schuff
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Data Mining – Association Rules
By Arijit Chatterjee Dr
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
Waikato Environment for Knowledge Analysis
CPS216: Advanced Database Systems Data Mining
Exam #3 Review Zuyin (Alvin) Zheng.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
MIS2502: Data Analytics Classification using Decision Trees
Data Mining Association Analysis: Basic Concepts and Algorithms
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Clustering and Segmentation
MIS2502: Review for Exam 3 Aaron Zhi Cheng
MIS2502: Data Analytics Clustering and Segmentation
Department of Computer Science National Tsing Hua University
MIS2502: Data Analytics Classification Using Decision Trees
MIS2502: Data Analytics Association Rule Learning
Association Analysis: Basic Concepts
Presentation transcript:

The Three Analytics Techniques

Decision Trees – Determining Probability

Decision Trees – Chi Square

Example: Chi-squared test Is the proportion of the outcome class the same in each child node? It shouldn’t be, or the classification isn’t very helpful Observed OwnsRents Default No Default

Example: Chi-squared test Is the proportion of the outcome class the same in each child node? It shouldn’t be, or the classification isn’t very helpful Root (n=1500) Default = 750 No Default = 750 Owns (n=850) Default = 300 No Default = 550 Rents (n=650) Default = 450 No Default = 200 Observed OwnsRents Default No Default Expected OwnsRents Default No Default

Chi-squared test If the groups were the same, you’d expect an even split (Expected) But we can see they aren’t distributed evenly (Observed) But is it enough (i.e., statistically significant)? Small p-values (i.e., less than 0.05 mean it’s very unlikely the groups are the same) So Owns/Rents is a predictor that creates two different groups Observed OwnsRents Default No Default Expected OwnsRents Default No Default

Cluster Analysis – Cohesion and Separation

Cluster Analysis What do you look for in the histogram that tells you a variable should not be included in the cluster analysis?

Cluster Analysis What do you look for in the histogram that tells you a variable should not be included in the cluster analysis? Cluster 1 Cluster SSE 1 = = = 6.69 SSE 2 = = = 22.14

Separation and Cohesion Which is better? Distance within clusters is minimized Distance between clusters is maximized

Segment Profile Plot

Association Rules Mining

Support count (  ) In how many baskets does the itemset appear?  {Milk, Beer, Diapers} = 2 (i.e., in baskets 3 and 4) Support (s) Fraction of transactions that contain all items in X  Y s({Milk, Diapers, Beer}) = 2/5 = 0.4 BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke

Confidence Confidence is the strength of the association Measures how often items in Y appear in transactions that contain X BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke This says 67% of the times when you have milk and diapers in the itemset you also have beer! c must be between 0 and 1 1 is a complete association 0 is no association

Lift Example What’s the lift for the rule: {Milk, Diapers}  {Beer} So X = {Milk, Diapers} Y = {Beer} s({Milk, Diapers, Beer}) = 2/5 = 0.4 s({Milk, Diapers}) = 3/5 = 0.6 s({Beer}) = 3/5 = 0.6 So BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke When Lift > 1, the occurrence of X  Y together is more likely than what you would expect by chance

Another example Checking Account Savings Account NoYes No Yes Are people more inclined to have a checking account if they have a savings account? Support ({Savings}  {Checking}) = 5000/10000 = 0.5 Support ({Savings}) = 6000/10000 = 0.6 Support ({Checking}) = 8500/10000 = 0.85 Confidence ({Savings}  {Checking}) = 5000/6000 = 0.83 Answer: No In fact, it’s slightly less than what you’d expect by chance!

Final Question Can you have high confidence and low lift?