1 A Theoretical Framework for Association Mining based on the Boolean Retrieval Model on the Boolean Retrieval Model Peter Bollmann-Sdorra.

Slides:



Advertisements
Similar presentations
Brief Introduction to Logic. Outline Historical View Propositional Logic : Syntax Propositional Logic : Semantics Satisfiability Natural Deduction : Proofs.
Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Fast Algorithms For Hierarchical Range Histogram Constructions
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Marzena Kryszkiewicz DaWak 2009 Non-Derivable Item Set and Non- Derivable Literal Set Representations of Patterns Admitting Negation.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
Learning Fuzzy Association Rules and Associative Classification Rules Jianchao Han Computer Science Department California State University Dominguez Hills.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
1 Conditional XPath, the first order complete XPath dialect Maarten Marx Presented by: Einav Bar-Ner.
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
Data Mining Association Analysis: Basic Concepts and Algorithms
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Brief Introduction to Logic. Outline Historical View Propositional Logic : Syntax Propositional Logic : Semantics Satisfiability Natural Deduction : Proofs.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Data Mining Association Analysis: Basic Concepts and Algorithms
Generating Non-Redundant Association Rules Mohammed J. Zaki.
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
2-1 Sample Spaces and Events Conducting an experiment, in day-to-day repetitions of the measurement the results can differ slightly because of small.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 Discussion Class 3 Inverse Document Frequency. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for.
Vector Space Model CS 652 Information Extraction and Integration.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
3.Mining Association Rules in Large Database 3.1 Market Basket Analysis:Example for Association Rule Mining 1.A typical example of association rule mining.
2-1 Sample Spaces and Events Random Experiments Figure 2-1 Continuous iteration between model and physical system.
2-1 Sample Spaces and Events Random Experiments Figure 2-1 Continuous iteration between model and physical system.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
A Probabilistic Quantifier Fuzzification Mechanism: The Model and Its Evaluation for Information Retrieval Felix Díaz-Hemida, David E. Losada, Alberto.
Propositional Calculus CS 270: Mathematical Foundations of Computer Science Jeremy Johnson.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Association Rule Mining
DATA MINING By Cecilia Parng CS 157B.
Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002.
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1998 년 8 월 7 일 Data Engineering Lab 성 유진 1 Exploratory Mining and Pruning Optimization of Constrained Associations Rules.
A Probabilistic Quantifier Fuzzification Mechanism: The Model and Its Evaluation for Information Retrieval Felix Díaz-Hemida, David E. Losada, Alberto.
By Arijit Chatterjee Dr
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Model.
Market Basket Analysis and Association Rules
Counting Methods and Probability Theory
Counting Methods and Probability Theory
Association Analysis: Basic Concepts
Presentation transcript:

1 A Theoretical Framework for Association Mining based on the Boolean Retrieval Model on the Boolean Retrieval Model Peter Bollmann-Sdorra

2 Contents Introduction Background Boolean Association Mining Expressing item-sets as queries Conclusions Future Work

3 Introduction Researchers focus on discovering rules in the form of implications between itemsets which have adequate supports. Having frequent itemsets as both antecedent and precedent parts of rules represent only the simplest form of predicates. This simplicity is due in part to the lack of a theoretical framework that includes more expressive predicates.

4 Motivation In Information retrieval systems, a strong theoretical background gives the user the power to ask more sophisticated and pertinent questions. Information retrieval and association mining are two complementary processes on the same data records or transactions. In information retrieval, given a query, we need to find the subset of records that matches the query. In contrast, in data mining, we need to find the queries (rules) having adequate number of records that support them.

5 Proposed Solution we introduce the theory of association mining that is based on a model of retrieval known as the Boolean Retrieval Model, where a Boolean query that uses only the AND operator is analogous to an itemset, a general Boolean query (AND, OR or NOT) has interpretation as a generalized itemset, notions of support of itemsets and confidence of rules can be dealt with uniformly, and an event algebra can be defined, involving all possible transaction subsets, to formally obtain a probability space.

6 Background Deriving association rules from data: Given a set of items I={i 1,i 2,..., i n }, and a set of transactions T = {t 1, t 2,..., t m }, each transaction t i  T, such that t i  I, an association rule is defined as X  Y, where X  I, Y  I, and X  Y = , describes the existence of a relationship between the two itemsets X and Y.

7 The percentage of transactions in the database that contain both X and Y. Measure for Significance

8 The percentage of transactions that contain Y among those transactions containing X. Measure for Importance

9 Represents a test of statistical independence. Measure for Importance

10 Boolean Association Mining Given a set of items I = {i 1, i 2, …, i n }, a transaction t is defined as a subset of items such that t  2 I, where 2 I = { , {i 1 }, {i 2 }, …, {i n }, {i 1, i 2 }, …, { i 1, i 2, …, i n }}. Let T  2 I be a given set of transactions {t 1, t 2, …, t m }. Every transaction t  T has an assigned weight w’(t).

11 Possible Weights

12 weights w’s are normalized to and

13 Let I = {beer, milk, bread} be the set of all items, where price(beer) = 5, price(milk) = 3, and price(bread) = 2. The set of transactions T is f(t) is the frequency of transaction t Example

14 Case 1: W’(t) = 1,

15 Case 2: W’(t) = f(t),

16 Case 3: W’(t) = |t| * g(t), Let g(t)=f(t),

17 Case 4: W’(t) = v(t) * g(t), Let g(t)=f(t) and v(t)=Price(t)

18 Expressing item-sets as queries (logical expressions) Definition 1: For a given set of items I, the set Q of all possible queries associated with item-sets created from I is defined as follows.  i  I  i  Q,  q, q’  Q  q  q’  Q These are all.

19 Definition 2: For any query q  Q, the response set of q, RS(q), is defined as follows:  For all atomic i  Q, RS(i) = {t  T | i  t}  RS (q  q’) = RS(q)  RS(q’)

20 Definition 3: Let q = (i 1  i 2  …  i k ) and A q denote the item-set associated with q; that is, A q = {i 1, i 2, …, i k }, the support of A q is defined as where q = (i 1  i 2  …  i k ).

21 Lemma 1: The support set of A q ; SS(A q ), equals to RS(q). Lemma 2: For queries q, q 1, q 2 and q 3, the following axioms hold:  RS(q  q) = RS(q)  RS((q 1  q 2 )  q 3 ) = RS(q 1  (q 2  q 3 ))  RS(q 1  q 2 ) = RS(q 2  q 1 )

22 Example: RS((x 1  x 2 )  (x 3  x 2 )) = RS(x 1  x 2  x 3 )

23 Definition 4: For a given set of items I, the set Q* of all possible queries is defined as follows.  i  I  i  Q*,  q, q’  Q*  q  q’  Q*  q, q’  Q*  q  q’  Q*  q  Q*   q  Q*

24 Definition 5: For any query q  Q*, the response set of transactions, R (q) is defined as  For all i  Q*, RS (i) = {t  T | i  t}  RS (q  q’) = RS (q)  RS (q’)  RS (q  q’) = RS (q)  RS (q’)  RS (  q) = T - RS (q)

25 Theorem: If q is a transformation of q’ that is obtained by applying the rules of Boolean algebra, then RS(q)= RS(q’) Each q  Q* can be considered as a generalized itemset. The itemsets investigated in earlier works only consider q  Q.

26 Lemma 3: {RS(q) | q  Q*}=2 T Theorem: (T, 2T, P) is a probability space.

27 Rules and Their Response Strengths Definition 6: The confidence of a rule A q  A q’ is defined as Definition 7: The interest of a rule A q  A q’ is defined as Definition 8: The support of a rule A q  A q’ is defined as

28 Lemma 4 : For a rule A q  A q’, Lemma 5: For a rule A q  A q’,

29 Conclusions The theory of association mining that is based on a model of retrieval known as the Boolean Retrieval Model has been introduced. The framework we develop derives from the observation that information retrieval and association mining are two complementary processes on the same data records or transactions. Based on the theory of Boolean retrieval, we generalize the itemset structure by using all Boolean operators.

30 Conclusions (cont.) By introducing the notion of support of generalized itemsets, a uniform measure for both itemsets and rules (generalized itemsets) has been developed. Support of a generalized itemset is extended to allow transactions to be weighted so that they can contribute to support unequally.

31 Future Work In order to only generate understandable queries, new restrictions or measures, such as, compactness and simplicity, should be introduced. (These restrictions or measures could eliminate a large number of frequent generalized itemsets, many of which could have complex structures.)