Constraint-based (Query-Directed) Mining Finding all the patterns in a database autonomously? — unrealistic! The patterns could be too many but not focused!

Slides:



Advertisements
Similar presentations
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Association Rules Outline Goal: Provide an overview of basic Association Rule mining techniques Association Rules Problem Overview –Large itemsets Association.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
1 Mining Association Rules in Large Databases Association rule mining Algorithms for scalable mining of (single-dimensional Boolean) association rules.
Association Analysis: Basic Concepts and Algorithms.
Mining Association Rules in Large Databases
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 Association Rule Mining (II) Instructor: Qiang Yang Thanks: J.Han and J. Pei.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining.
SFU, CMPT 741, Fall 2009, Martin Ester 253 Association Rules and Frequent Pattern Analysis Contents of this Chapter 5.1 Introduction 5.2 Basic Association.
Mining Association Rules
Mining Frequent Patterns
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
What Is Sequential Pattern Mining?
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Techniques
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP Seminar GNET 713 BCB Module Spring 2007.
November 3, 2015Data Mining: Concepts and Techniques1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road map Efficient.
Association Rules: Advanced Topics. Apriori Adv/Disadv Advantages: –Uses large itemset property. –Easily parallelized –Easy to implement. Disadvantages:
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 7 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 7 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
UNIT-5 Mining Association Rules in Large Databases LectureTopic ********************************************** Lecture-27Association rule mining Lecture-28Mining.
1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Chapter 6: Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association, and Correlations (cont.) Pertemuan 06 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1998 년 8 월 7 일 Data Engineering Lab 성 유진 1 Exploratory Mining and Pruning Optimization of Constrained Associations Rules.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Reducing Number of Candidates
UNIT-5 Mining Association Rules in Large Databases
Data Mining Association Analysis: Basic Concepts and Algorithms
What Is Frequent Pattern Analysis?
Mining Association Rules
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Transactional data Algorithm Applications
Association Rule Mining
Association Analysis: Basic Concepts and Algorithms
Presentation transcript:

Constraint-based (Query-Directed) Mining Finding all the patterns in a database autonomously? — unrealistic! The patterns could be too many but not focused! Data mining should be an interactive process User directs what to be mined using a data mining query language (or a graphical user interface) Constraint-based mining User flexibility: provides constraints on what to be mined System optimization: explores such constraints for efficient mining—constraint-based mining

Constraints in Data Mining Knowledge type constraint: classification, association, etc. Data constraint find product pairs sold to Chicago customers in 2004 Dimension/level constraint in relevance to region, price, brand, customer category Rule (or pattern) constraint small sales (price $200) Interestingness constraint strong rules: min_support  3%, min_confidence  60%

Constrained Mining vs. Constraint-Based Search Constrained mining vs. constraint-based search/reasoning Both are aimed at reducing search space Finding all patterns satisfying constraints vs. finding some (or one) answer in constraint- based search in AI or optimization Constrained mining vs. query processing in DBMS Constrained pattern mining shares a similar philosophy as pushing selections deeply in query processing

Anti-Monotonicity in Constraint Pushing Anti-monotonicity When an itemset S violates the constraint, so does any of its superset sum(S.Price)  v is anti-monotone sum(S.Price)  v is not anti-monotone Example. C: range(S.profit)  15 is anti- monotone Itemset ab violates C So does every superset of ab TIDTransaction 10a, b, c, d, f 20b, c, d, f, g, h 30a, c, d, e, f 40c, e, f, g TDB (min_sup=2) ItemProfit a40 b0 c-20 d10 e-30 f30 g20 h-10

Monotonicity for Constraint Pushing Monotonicity When an itemset S satisfies the constraint, so does any of its superset sum(S.Price)  v is monotone min(S.Price)  v is monotone Example. C: range(S.profit)  15 Itemset ab satisfies C So does every superset of ab TIDTransaction 10a, b, c, d, f 20b, c, d, f, g, h 30a, c, d, e, f 40c, e, f, g TDB (min_sup=2) ItemProfit a40 b0 c-20 d10 e-30 f30 g20 h-10

Succinctness Succinctness: Given A 1, the set of items satisfying a succinctness constraint C, then any set S satisfying C is based on A 1, i.e., S contains a subset belonging to A 1 Idea: Without looking at the transaction database, whether an itemset S satisfies constraint C can be determined based on the selection of items min(S.Price)  v is succinct sum(S.Price)  v is not succinct Optimization: If C is succinct, C is pre-counting pushable

The Apriori Algorithm — Example Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3

Naïve Algorithm: Apriori + Constraint Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3 Constraint: Sum{S.price} < 5

The Constrained Apriori Algorithm: Push an Anti-monotone Constraint Deep Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3 Constraint: Sum{S.price} < 5

The Constrained Apriori Algorithm: Push a Succinct Constraint Deep Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3 Constraint: min{S.price } <= 1 not immediately to be used

Converting “Tough” Constraints Convert tough constraints into anti- monotone or monotone by properly ordering items Examine C: avg(S.profit)  25 Order items in value-descending order If an itemset afb violates C So does afbh, afb* It becomes anti-monotone! TIDTransaction 10a, b, c, d, f 20b, c, d, f, g, h 30a, c, d, e, f 40c, e, f, g TDB (min_sup=2) ItemProfit a40 b0 c-20 d10 e-30 f30 g20 h-10

Strongly Convertible Constraints avg(X)  25 is convertible anti-monotone w.r.t. item value descending order R: If an itemset af violates a constraint C, so does every itemset with af as prefix, such as afd avg(X)  25 is convertible monotone w.r.t. item value ascending order R -1 : If an itemset d satisfies a constraint C, so does itemsets df and dfa, which having d as a prefix Thus, avg(X)  25 is strongly convertible ItemProfit a40 b0 c-20 d10 e-30 f30 g20 h-10

Can Apriori Handle Convertible Constraints? A convertible, not monotone nor anti-monotone nor succinct constraint cannot be pushed deep into the an Apriori mining algorithm Within the level wise framework, no direct pruning based on the constraint can be made Itemset df violates constraint C: avg(X)>=25 Since adf satisfies C, Apriori needs df to assemble adf, df cannot be pruned But it can be pushed into frequent-pattern growth framework! ItemValue a40 b0 c-20 d10 e-30 f30 g20 h-10

Mining With Convertible Constraints C: avg(X) >= 25, min_sup=2 List items in every transaction in value descending order R: C is convertible anti-monotone w.r.t. R Scan TDB once remove infrequent items Item h is dropped Itemsets a and f are good, … Projection-based mining Imposing an appropriate order on item projection Many tough constraints can be converted into (anti)-monotone TIDTransaction 10a, f, d, b, c 20f, g, d, b, c 30 a, f, d, c, e 40 f, g, h, c, e TDB (min_sup=2) ItemValue a40 f30 g20 d10 b0 h-10 c-20 e-30

Recall Traversal of Itemset Lattice

Handling Multiple Constraints Different constraints may require different or even conflicting item-ordering If there exists an order R s.t. both C 1 and C 2 are convertible w.r.t. R, then there is no conflict between the two convertible constraints

What Constraints Are Convertible? Constraint Convertible anti- monotone Convertible monotone Strongly convertible avg(S) ,  v Yes median(S) ,  v Yes sum(S)  v (items could be of any value, v  0) YesNo sum(S)  v (items could be of any value, v  0) NoYesNo sum(S)  v (items could be of any value, v  0) NoYesNo sum(S)  v (items could be of any value, v  0) YesNo ……

Constraint-Based Mining—A General Picture ConstraintAntimonotoneMonotoneSuccinct v  S noyes S  V noyes S  V yesnoyes min(S)  v noyes min(S)  v yesnoyes max(S)  v yesnoyes max(S)  v noyes count(S)  v yesnoweakly count(S)  v noyesweakly sum(S)  v ( a  S, a  0 ) yesno sum(S)  v ( a  S, a  0 ) noyesno range(S)  v yesno range(S)  v noyesno avg(S)  v,   { , ,  } convertible no support(S)   yesno support(S)   noyesno

A Classification of Constraints Convertible anti-monotone Convertible monotone Strongly convertible Inconvertible Succinct Antimonotone Monotone

Visualization of Association Rules: Plane Graph

Visualization of Association Rules: Rule Graph

Visualization of Association Rules (SGI/MineSet 3.0)