Finding Frequent Itemsets by Transaction Mapping

Finding Frequent Itemsets by Transaction Mapping
Mingjun Song ,Sanguthevar Rajasekaan Proceedings of the 2005 ACM symposium on Applied computing 報告者：林靜怡 2006/01/13

Introduction Apriori algorithm needs many database scans
for each scan, frequent itemsets are searched by pattern matching time-consuming for large frequent itemsets with long patterns.

TM Algorithm Vertical database representation Transaction mapping
Transaction ids of each itemset are mapped and compressed to continuous transaction intervals in a different space reducing the number of intersections

Lexicographic Prefix Tree

Lexicographic Prefix Tree (conti.)
generate candidate itemsets and test their frequency. Each node in the tree stores a collection of frequent itemsets.

Lexicographic Prefix Tree (conti.)
Depth first--if the expansion of a node cannot possibly lead to the discovery of itemsets that have minimum support, then the node will not be expanded and the search will backtrack. When a frequent itemset that meets the minimum support requirement is found, it is output.

Transaction Mapping Scan through the database once and identify all frequent 1-itemsets sort them in descending order of frequency 1-itemsets

Transaction Mapping sup{1} = 5 sup{2} = 5 sup{3} = 4 sup{4} = 2
min_sup = 2 sup{1} = 5 sup{2} = 5 sup{3} = 4 sup{4} = 2 sup{5} = 1 sup{6} = 1 . sup{20}=1 identify all frequent 1-itemsets Frequent 1-itemsets： 1,2,3,4

Transaction Mapping(Conti.)
Scan through the database again For each transaction, select items that are in frequent 1-itemsets sort them according to the order of frequent 1-itemsets insert them into the transaction tree

Transaction Tree At the beginning the root is the current node.
if the current node has a child node whose id is equal to this item, then just increment the count of this child by 1 otherwise create a new child node and set its counter as 1.

Transaction Tree root 1:1 2:1 2:1 3:1 3:1 4:1 3:1

Node Interval a node u that has an associated interval of [s, e], where s is the relabeled start id, e is the relabeled end id. If the node is the first child of it’s parent s = start id of u’s parent If not s = the end id of its previous child+1 e = start id of u + counter - 1

Node Interval [1,5] [6,8] [1,2] [3,3] [6,6] [7,8] [1,2]
not first child s=2+1=3 c=3+1-1=3 first child s=1 c=1+2-1=2 first child s=1 c=1+2-1=2 first child s=1 c=1+5-1=5 [1,5] [6,8] [1,2] [3,3] [6,6] [7,8] [1,2]

output min_sup = 2 1 2 3 4 {1,2} {1,3} intersect [1,2] >2 {1,2,3,4}
<2 {1,2,4} intersect <2 {1,2} intersect [1,2] >=2 {1,2,3} intersect [1,2] >=2 2 3 4 1 3,4 2 3 {1,2,3} 4 {1,3} 2 3 4 4 3 4 3 {2,3} {2,4} 4 3

Experiments OS：Windows 2000 CPU：DELL 2.4GHz Pentium PC RAM：1GB
Compiler：Visual C++

Experiments synthetic data real data

Experiments

Finding Frequent Itemsets by Transaction Mapping

Similar presentations

Presentation on theme: "Finding Frequent Itemsets by Transaction Mapping"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Finding Frequent Itemsets by Transaction Mapping

Similar presentations

Presentation on theme: "Finding Frequent Itemsets by Transaction Mapping"— Presentation transcript:

Similar presentations

About project

Feedback