Presentation is loading. Please wait.

Presentation is loading. Please wait.

0 0 0 0 1 P 11 4. Left half of rt half ? false  0 0 2. Left half pure1? false  0 0 0 1. Whole is pure1? false  0 5. Rt half of right half? true  1.

Similar presentations


Presentation on theme: "0 0 0 0 1 P 11 4. Left half of rt half ? false  0 0 2. Left half pure1? false  0 0 0 1. Whole is pure1? false  0 5. Rt half of right half? true  1."— Presentation transcript:

1

2 0 0 0 0 1 P 11 4. Left half of rt half ? false  0 0 2. Left half pure1? false  0 0 0 1. Whole is pure1? false  0 5. Rt half of right half? true  1 0 0 1 R 11 0 1 To find the number of occurences of 7 0 1 4, AND these basic Ptrees (next slide) Predicate trees (Ptrees): vertically project each attribute, Given a table structured into horizontal records. Traditional way: Vertical Processing of Horizontal Data - VPHD ) Top-down construction of the 1-dimensional Ptree of R 11, denoted, P 11 : Record the truth of the universal predicate pure 1 in a tree recursively on halves (1/2 1 subsets), until purity is achieved. 3. Right half pure1? false  0 0 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 1 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 R 11 R 12 R 13 R 21 R 22 R 23 R 31 R 32 R 33 R 41 R 42 R 43 R[A 1 ] R[A 2 ] R[A 3 ] R[A 4 ] 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 011 010 001 100 010 010 001 101 111 000 001 100 But it is pure (pure0) so this branch ends then vertically project each bit position of each attribute, then compress each bit slice into a basic 1D Ptree. e.g., compression of R 11 into P 11 goes as follows: P 11 pure1? false=0 pure1? true=1 pure1? false=0 R (A 1 A 2 A 3 A 4 ) 2 7 6 1 6 7 6 0 3 7 5 1 2 7 5 7 3 2 1 4 2 2 1 5 7 0 1 4 for Horizontally structured records Scan vertically 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 011 010 001 100 010 010 001 101 111 000 001 100 = Base 10Base 2 P 11 P 12 P 13 P 21 P 22 P 23 P 31 P 32 P 33 P 41 P 42 P 43 0 0 0 0 1 1 0 0 0 0 0 0 1 01 10 0 1 0 0 1 0 0 0 0 1 0 01 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 01 0 0 0 0 01 0 0 0 0 1 0 0 10 01 ^^ ^ ^ ^ ^^ Review of Vertical Data and 1-D Ptrees VPHD to find the number of occurences of 7 0 1 4 =2 Now Horizonal Processing of Vertical Data HPVD!

3 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 R 11 R 12 R 13 R 21 R 22 R 23 R 31 R 32 R 33 R 41 R 42 R 43 R[A 1 ] R[A 2 ] R[A 3 ] R[A 4 ] 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 101 010 001 100 010 010 001 101 111 000 001 100 To count occurrences of 7,0,1,4 use 111000001100 : 0 P 11 ^P 12 ^P 13 ^P’ 21 ^P’ 22 ^P’ 23 ^P’ 31 ^P’ 32 ^P 33 ^P 41 ^P’ 42 ^P’ 43 = 0 0 01 ^ 7 0 1 4 P 11 P 12 P 13 P 21 P 22 P 23 P 31 P 32 P 33 P 41 P 42 P 43 0 0 0 0 1 10 0 1 0 0 1 0 0 0 0 0 0 1 01 10 0 1 0 0 1 0 0 0 0 1 0 01 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 01 0 0 0 0 01 0 0 0 0 1 0 0 10 01 ^ ^^^ ^ ^ ^ ^^ R (A 1 A 2 A 3 A 4 ) 2 7 6 1 3 7 6 0 2 7 5 1 2 7 5 7 5 2 1 4 2 2 1 5 7 0 1 4 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 101 010 001 100 010 010 001 101 111 000 001 100 = This 0 makes entire left branch 0 These 0 s make this node 0 These 1 s and these 0 s (which when complemented are 1's) make this node 1 The 2 1 -level (2 nd level has the only 1-bit so the 1-count of the Ptree is 1*2 1 = 2 # change Vertical Data: 1Shortcuts in the processing of 1-Dimensional Ptrees

4 HTT Scan D C1C1 TID 12345 10010110 20001101 30011101 40001001 C 2 Scan D C2C2 F 3 = L 3 Scan D P 1 2 //\\ 1010 P 2 3 //\\ 0111 P 3 3 //\\ 1110 P 4 1 //\\ 1000 P 5 3 //\\ 0111 Build Ptrees: Scan D L 1 ={1}{2}{3}{5} P 1 ^P 2 1 //\\ 0010 P 1 ^P 3 2 //\\ 1010 P 1 ^P 5 1 //\\ 0010 P 2 ^P 3 2 //\\ 0110 P 2 ^P 5 3 //\\ 0111 P 3 ^P 5 2 //\\ 0110 L 2 ={13}{23}{25}{35} P 1 ^P 2 ^P 3 1 //\\ 0010 P 1 ^P 3 ^P 5 1 //\\ 0010 P 2 ^P 3 ^P 5 2 //\\ 0110 L 3 ={235} F 1 = L 1 F 2 = L 2 {123} pruned since {12} not frequent {135} pruned since {15} not frequent Example ARM using uncompressed Ptrees (note: I have placed the 1-count at the root of each Ptree) C3C3 itemset {2 3 5} {1 2 3} {1,3,5} Data_Lecture_4.1_ARM

5 L3L3 L1L1 L2L2 1-ItemSets don’t support Association Rules (They will have no antecedent or no consequent). Are there any Strong Rules supported by Frequent=Large 2-ItemSets (at minconf=.75)? {1,3}conf({1}  {3}) = supp{1,3}/supp{1} = 2/2 = 1 ≥.75 STRONG conf({3}  {1}) = supp{1,3}/supp{3} = 2/3 =.67 <.75 {2,3}conf({2}  {3}) = supp{2,3}/supp{2} = 2/3 =.67 <.75 conf({3}  {2}) = supp{2,3}/supp{3} = 2/3 =.67 <.75 {2,5}conf({2}  {5}) = supp{2,5}/supp{2} = 3/3 = 1 ≥.75 STRONG! conf({5}  {2}) = supp{2,5}/supp{5} = 3/3 = 1 ≥.75 STRONG! {3,5}conf({3}  {5}) = supp{3,5}/supp{3} = 2/3 =.67 <.75 conf({5}  {3}) = supp{3,5}/supp{5} = 2/3 =.67 <.75 Are there any Strong Rules supported by Frequent or Large 3-ItemSets? {2,3,5}conf({2,3}  {5}) = supp{2,3,5}/supp{2,3} = 2/2 = 1 ≥.75 STRONG! conf({2,5}  {3}) = supp{2,3,5}/supp{2,5} = 2/3 =.67 <.75 conf({3,5}  {2}) = supp{2,3,5}/supp{3,5} = 2/3 =.67 <.75 No subset antecedent can yield a strong rule either (i.e., no need to check conf({2}  {3,5}) or conf({5}  {2,3}) since both denominators will be at least as large and therefore, both confidences will be at least as low. No need to check conf({3}  {2,5}) or conf({5}  {2,3}) DONE! 2-Itemsets do support ARs. Data_Lecture_4.1_ARM

6 Ptree-ARM versus Apriori on aerial photo (RGB) data together with yeild data Scalability with support threshold 1320  1320 pixel TIFF- Yield dataset (total number of transactions is ~1,700,000). P-ARM compared to Horizontal Apriori (classical) and FP-growth (an improvement of it).  In P-ARM, we find all frequent itemsets, not just those containing Yield (for fairness)  Aerial TIFF images (R,G,B) with synchronized yield (Y). Scalability with number of transactions  Identical results  P-ARM is more scalable for lower support thresholds.  P-ARM algorithm is more scalable to large spatial datasets.

7 P-ARM versus FP-growth (see literature for definition) Scalability with support threshold 17,424,000 pixels (transactions) Scalability with number of trans  FP-growth = efficient, tree-based frequent pattern mining method (details later)  For a dataset of 100K bytes, FP-growth runs very fast. But for images of large size, P-ARM achieves better performance.  P-ARM achieves better performance in the case of low support threshold.


Download ppt "0 0 0 0 1 P 11 4. Left half of rt half ? false  0 0 2. Left half pure1? false  0 0 0 1. Whole is pure1? false  0 5. Rt half of right half? true  1."

Similar presentations


Ads by Google