Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.

Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University

Outline Introduction Review of Predicate trees Quantitative frequent pattern mining Performance analysis Summary

Introduction ARM was first introduced by Agrawal et al in 1993. ARM can be used for categorical and quantitative attributes. The approach of categorical ARM is extended to the quantitative data by using intervals. An example would be: age  [30,45] and income  [40, 60]  #car  [1, 2].

Limitations of Traditional Tree Structures Tree structures used in quantitative ARM: hash trees, R-trees, prefix-trees, FP-trees, etc. They are built on-the-fly according to the chosen quantitative intervals. There is need to rebuild these trees when intervals change

Predicate Tree Approach In this paper, we present Predicate tree based quantitative frequent pattern mining (PQM). The central idea of PQM is to exploit predicate P-trees to get frequent pattern counts of any quantitative interval. Predicate-trees (P-trees) are lossless, vertical bitwise compressed data structures.

Advantages of PQM P-trees are pre-generated tree structures, which are flexible and efficient for any data partition and interval optimization; PQM is efficient by using fast P-tree logic operations; PQM has better support threshold scalability and cardinality scalability due to the vertically decomposed structure and compression of P-trees.

Review Of Predicate Trees A Predicate tree (P-tree) is a lossless, vertical bitwise compressed data structure. A P-tree can be 1-dimensional, 2- dimensional, 3-dimensional, etc. In this paper, we focus on 1-dimensional P- trees.

Construction of P-trees Given a data set with d attributes, R = (A 1, A 2 … A d ), and the binary representation of j th attribute A j as b j,m b j,m-1...b j,i … b j,1. To build up a 1-D P-tree: 1) Attributes are decomposed into bit files, one file for each bit position; 2) A bit file is recursively partitioned into halves and each half into sub-halves until the sub- half is pure (entirely 1-bits or entirely 0-bits).

6. 1 st half of 1 st of 2 nd is  1 0 0 1 1 4. 1 st half of 2 nd half not  0 0 2. 1 st half is not pure1  0 0 0 1. Whole file is not pure1  0 Horizontal structure Processed vertically (scans) P 11 P 12 P 13 P 21 P 22 P 23 P 31 P 32 P 33 P 41 P 42 P 43 0 0 0 0 1 10 0 1 0 0 1 0 0 0 0 0 0 1 01 10 0 1 0 0 1 0 0 0 0 1 0 01 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 10 01 5. 2 nd half of 2 nd half is  1 0 0 1 0000101100001011 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 101 010 001 100 010 010 001 101 111 000 001 100 R(A 1 A 2 A 3 A 4 ) 3. 2 nd half is not pure1  0 0 7. 2 nd half of 1 st of 2 nd not  0 0 0 1 10 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 101 010 001 100 010 010 001 101 111 000 001 100 --> R[A 1 ] R[A 2 ] R[A 3 ] R[A 4 ] 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 R 11 R 12 R 13 R 21 R 22 R 23 R 31 R 32 R 33 R 41 R 42 R 43 Construction of P-trees(Cont.)

Pure1 trees and logical operations Pure1 trees: Operations: 0 0 0 0 1 1 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 0 0 0 1 P1 1 P1 2 P1 3 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 1 P1 3 ’ P1 1  P1 2 P1 1  P1 3

Predicate value P-tree: P x=v A value P-tree represents a tuple subset, X, of all tuples containing a specified value, v, of an attribute, A. It is denoted by P A,v Let v=b m b m-1 …b 1, where b i is i th bit of v in binary. There are two steps to calculate P v. 1) Get the bit-value-Ptree P A,v, i for each bit position of v according to the bit value: If b i = 1, P A,v,i = P i ; Otherwise P A,v,i = P i ’, 2) Calculate P A,v by ANDing all the bit-value-Ptrees of v, i.e. P A,v = P b m  P b m-1 …  P b 1

Predicate range tree: P x  v v=b m …b i …b 1 P x  v = P’ m op m … P’ i op i P’ i-1 … op k+1 P’ k 1) op i is  if b i =0, op i is  otherwise 2) k is the rightmost bit position with value of “0” 3) the operators are right binding. For example: P x  101 = (P’ 2  P’ 1 )

Predicate range tree: P x  v v=b m …b i …b 1 P x  v = P m op m … P i op i P i-1 … op k+1 P k 1) op i is  if b i =1, op i is  otherwise 2) k is the rightmost bit position with value of “0” 3) the operators are right binding. For example: P x  101 = (P2  (P1  P0))

P-tree Quantitative Frequent Pattern Mining (PQM) The central idea of PQM is to exploit P-trees to get frequent pattern counts of any quantitative interval. P-trees, unlike other tree structures, are pre- generated. There is no need to construct trees on- the-fly during interval generations and merges. Interval P-tree: P l  A  u = P A  l  P A  u P A  l and P A  u are predicate range trees for predicates A  l and A  u

PQM algorithm Determine the number of partitions for each quantitative attribute; Calculate support of each 1-item pattern using Predicate trees. For quantitative attributes, adjacent intervals are combined if their support is below the user-defined threshold; Select patterns with minimum support to get frequent patterns; Generate (k+1)-item frequent pattern candidates based on k-item frequent patterns; Calculate support of each (k+1)-item frequent pattern candidates.

Example: Frequent Pattern Mining

Example of PQM (Cont.) Interval: age  [30, 45] 10 or age  [011110, 101101] 2 P 30  age  45 = P age  30  P age  45 = P age  011110  P age  101101. Root count of P 30  age  45, N age 30,45, denotes the number of transactions that involves age  [30, 45].

Calculation process of N age 30,45

When the interval changes Use the same P-trees and only need to calculate range P-trees based on the new boundary values. Especially when two adjacent intervals are merged, there is no need to calculate the new range P-trees from the scratch. We can simply OR two range P-trees for two adjacent intervals. Example: P 15  age  45 = P 15  age  29  P 30  age  45

Multi-item pattern mining AND the P-tree of each item pattern to get the multi-item pattern P-tree The item can be categorical or quantitative. Example: we want to find 2-item pattern age  [30,45] and sex = 1 2-item pattern P-tree P 30  age  45, sex=1 = P 30  age  45 AND P sex=1

PQM Process (min. sup = 0.5)

Performance Analysis The experiment results show PQM algorithm is more scalable than Apriori in terms of support threshold and the number of transactions. 100 200 300 400 500 600 700 0%20%40%60%80%100% Support threshold Run Time (s) PQM Apriori a) Scalability with support threshold b) Scalability with transaction size

Summary In this paper, we present a quantitative frequent pattern mining algorithm using P-trees (PQM). P-trees can be used for any interval. There is no need to build P-trees on-the-fly. P-trees are not only flexible but also efficient for interval optimization. Fast P-tree logic operations are used to achieve efficient frequent pattern mining. Our approach has better performance due to the vertical decomposed data structure and compression of P-trees

Thank you.

Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.

Similar presentations

Presentation on theme: "Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.

Similar presentations

Presentation on theme: "Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University."— Presentation transcript:

Similar presentations

About project

Feedback