Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:

Slides:



Advertisements
Similar presentations
Classification Techniques: Decision Tree Learning
Advertisements

FP-Growth algorithm Vasiljevic Vladica,
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Data Mining Association Analysis: Basic Concepts and Algorithms
Decision Trees.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.
Instance-based representation
Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
An overview of The IBM Intelligent Miner for Data By: Neeraja Rudrabhatla 11/04/1999.
WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.
Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Chapter 4: Algorithms CS 795.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Module 04: Algorithms Topic 07: Instance-Based Learning
Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.
Data Mining – Output: Knowledge Representation
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Appendix: The WEKA Data Mining Software
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Data Mining Practical Machine Learning Tools and Techniques Chapter 3: Output: Knowledge Representation Rodney Nielsen Many of these slides were adapted.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Bayes February 17, 2009.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Association Analysis (3)
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.1: Decision Trees Rodney Nielsen Many /
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Mining Practical Machine Learning Tools and Techniques.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Data Mining Practical Machine Learning Tools and Techniques
Data Science Algorithms: The Basic Methods
Prepared by: Mahmoud Rafeek Al-Farra
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Clustering.
Machine Learning: Lecture 3
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Dept. of Computer Science University of Liverpool
Data Mining CSCI 307, Spring 2019 Lecture 18
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Rodney Nielsen, Human Intelligence & Language Technologies Lab Implementation: Real Machine Learning Schemes Decision trees From ID3 to C4.5 (pruning, numeric attributes,...) Classification rules From PRISM to RIPPER and PART (pruning, numeric data, …) Association Rules Frequent-pattern trees Extending linear models Support vector machines and neural networks Instance-based learning Pruning examples, generalized exemplars, distance functions

Rodney Nielsen, Human Intelligence & Language Technologies Lab Implementation: Real Machine Learning Schemes Numeric prediction Regression/model trees, locally weighted regression Bayesian networks Learning and prediction, fast data structures for learning Clustering: hierarchical, incremental, probabilistic Hierarchical, incremental, probabilistic, Bayesian Semisupervised learning Clustering for classification, co-training

Rodney Nielsen, Human Intelligence & Language Technologies Lab Association Rules Apriori algorithm finds frequent item (attribute-value pair) sets via a generate-and-test methodology Successively longer item sets are formed from shorter ones Each different size of candidate item set requires a full scan of the data Combinatorial nature of generation process is costly – particularly if there are many item sets, or item sets are large Appropriate data structures can help FP-growth employs an extended prefix tree (FP-tree)

Rodney Nielsen, Human Intelligence & Language Technologies Lab FP-growth FP-growth uses a Frequent Pattern Tree (FP-tree) to store a compressed version of the data Only two passes over the instances are required to map the data into an FP-tree The tree is then processed recursively to “grow” large item sets directly Avoids generating and testing candidate item sets against the entire database

Rodney Nielsen, Human Intelligence & Language Technologies Lab Building a Frequent Pattern Tree 1)First pass over the data – count the number times individual items occur (number of times a particular attribute value occurs) 2)Second pass over the data – before inserting each instance into the FP-tree, sort its items in descending order of their frequency of occurrence, as found in step 1 a)I ndividual items that do not meet the minimum support are not inserted into the tree b)Hopefully many instances will share items that occur frequently individually, resulting in a high degree of compression close to the root of the tree

Rodney Nielsen, Human Intelligence & Language Technologies Lab An Example Using the Weather Data Frequency of individual items (minimum support = 6) play = yes9 windy = false8 humidity = normal7 humidity = high7 windy = true6 temperature = mild6 play = no5 outlook = sunny5 outlook = rainy5 temperature = hot4 temperature = cool4 outlook = overcast4

Rodney Nielsen, Human Intelligence & Language Technologies Lab An Example Using the Weather Data Instances with items sorted Final solution: six single-item sets (previous slide) plus two multiple-item sets that meet minimum support 1 windy=false, humidity=high, play=no, outlook=sunny, temperature=hot 2 humidity=high, windy=true, play=no, outlook=sunny, temperature=hot 3 play=yes, windy=false, humidity=high, temperature=hot, outlook=overcast 4 play=yes, windy=false, humidity=high, temperature=mild, outlook=rainy. 14 humidity=high, windy=true, temperature=mild, play=no, outlook=rainy play=yes and windy=false6 play=yes and humidity=normal6

Rodney Nielsen, Human Intelligence & Language Technologies Lab Finding Large Item Sets FP-tree for the weather data (min support 6) Process header table from bottom Add temperature=mild to the list of large item sets Are there any item sets containing temperature=mild that meet min support?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Finding Large Item Sets cont. FP-tree for the data conditioned on temperature=mild Created by scanning the first (original) tree Follow temperature=mild link from header table to find all instances that contain temperature=mild Project counts from original tree Header table shows that temperature=mild can't be grown any longer

Rodney Nielsen, Human Intelligence & Language Technologies Lab Finding Large Item Sets cont. FP-tree for the data conditioned on humidity=normal Created by scanning the first (original) tree Follow humidity=normal link from header table to find all instances that contain humidity=normal Project counts from original tree Header table shows that humidity=normal can be grown to include play=yes

Rodney Nielsen, Human Intelligence & Language Technologies Lab Finding Large Item Sets cont. All large item sets have now been found However, in order to be sure it is necessary to process the entire header link table from the original tree Association rules are formed from large item sets in the same way as for Apriori FP-growth can be up to an order of magnitude faster than Apriori for finding large item sets