Presentation is loading. Please wait.

Presentation is loading. Please wait.

Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North.

Similar presentations


Presentation on theme: "Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North."— Presentation transcript:

1 Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North Dakota State University, USA May 2002 (P-tree technology is patent pending by NDSU)

2 Outline Concepts – Association Rule Mining – Market Basket Data – Remotely Sensed Imagery (RSI) data – Peano Count Trees (P-trees) Association rule mining on RSI data using P-trees Performance analysis Conclusion

3 Association Rule Mining Originally proposed for market basket data. Given – A set of items I = {i 1,i 2,…i m } (e.g., items purchasable in a market) – A set of transactions D (e.g., customers checking out = id + itemset) An association rule is X=>Y, where X, Y are disjoint itemsets – X, Y are consider as events. E.g., X is the event that a transaction contains X. X=>Y is the event: “if t contains X, then it contains Y” X is called the antecedent, Y is called the consequent. Two measures: support (% trans containing X  Y) and confidence (% of those transactions containing X which also contain Y) Given minimum thresholds, minsup and minconf, – Find the frequent itemsets which have support above minsup. – Derive all rules supported by frequent sets, with confidence above minconf.

4 Association rule mining on RSI data RSI data can be viewed as a relational table – Each band (column) is an attribute (for simplicity we assume all values are bytes) – Each pixel (row) is a transaction. – Each interval in each band is an item. – Row/column or longitude/latitude is the primary key ARM task on RSI data – To mine implicit relations among different bands, for example, relations among spectral bands and yield. Example Rule (NDVI): NIR[192,255] ^ RED[0,63] => Yield[128,255]

5 Important ARM Algorithms Apriori – stepwise algorithm DHP (Direct Hashing and Pruning) – hash itemset counts and prune transactions Partition – divide the database into small partitions such that each can be processed independently and efficiently in memory. DIC (Dynamic Itemset Counting) – overlap the counting of candidate itemsets at different points during a scan. FP-growth – uses Frequent Pattern tree (FP-tree) to optimize candidate generation. Others…

6 Remotely Sensed Imagery (RSI) Data Satellite image – TM (Thematic Mapper) imagery (6, 7 or 8 bands) TM is Landsat satellite imagery covering the earth every 18 days since 1972. ETM+ (Landsat-7) contains 8 bands –7 VIR bands (Blue, Green, Red, NIR, MIR, TIR, MIR2) –1 Panchromatic band (PC). Aerial photography – TIFF (3 bands: Blue, Green, Red) Ground data – Yield, Moisture, Nitrate, Temperature, Elevation, etc

7 Precision Agriculture Dataset: TIFF Image and related Bands (1320×1320) RGB Moisture Yield Nitrate

8 812 445 43 60 59 146 83 188 812 446 43 58 50 146 83 188 812 447 44 60 52 146 83 187 812 448 43 63 54 146 83 186 812 449 43 69 52 146 83 186 812 450 47 73 54 146 83 185 812 451 50 68 58 146 83 184 812 452 51 65 54 146 83 183 812 453 46 63 54 146 83 182 812 454 33 53 50 146 83 182 812 455 30 49 47 146 83 181 812 456 41 55 54 146 83 180 812 457 40 55 57 146 83 179 812 458 43 56 52 146 83 178 812 459 42 52 52 146 83 177 812 460 40 58 45 146 83 176 812 461 40 66 47 146 83 176 812 462 38 59 47 145 83 175 812 463 34 51 55 145 82 175 812 464 39 53 63 145 82 174 812 465 36 54 57 145 82 173 812 466 42 57 48 145 82 173 812 467 40 59 43 145 82 172 812 468 39 68 50 145 82 172 812 469 40 56 57 145 82 172 812 470 30 45 43 145 82 172 812 471 33 57 45 145 82 172 812 472 35 58 62 145 82 173 812 473 30 54 63 145 82 173 812 474 30 57 52 145 82 173 x y R G B Y M N x: Row y: Column R: Red G: Green B: Blue Y: Yield M: Moisture N: Nitrate As a relation

9 Spatial Data Formats BAND-1 254 127 (1111 1110) (0111 1111) 14 193 (0000 1110) (1100 0001) BAND-2 37 240 (0010 0101) (1111 0000) 200 19 (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19

10 Spatial Data Formats BAND-1 254 127 (1111 1110) (0111 1111) 14 193 (0000 1110) (1100 0001) BAND-2 37 240 (0010 0101) (1111 0000) 200 19 (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19 BIL format (1 file) 254 127 37 240 14 193 200 19

11 Spatial Data Formats BAND-1 254 127 (1111 1110) (0111 1111) 14 193 (0000 1110) (1100 0001) BAND-2 37 240 (0010 0101) (1111 0000) 200 19 (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19 BIL format (1 file) 254 127 37 240 14 193 200 19 BIP format (1 file) 254 37 127 240 14 200 193 19

12 Spatial Data Formats BAND-1 254 127 (1111 1110) (0111 1111) 14 193 (0000 1110) (1100 0001) BAND-2 37 240 (0010 0101) (1111 0000) 200 19 (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19 BIL format (1 file) 254 127 37 240 14 193 200 19 BIP format (1 file) 254 37 127 240 14 200 193 19 bSQ format (16 files) B11 B12 B13 B14 B15 B16 B17 B18 B21 B22 B23 B24 B25 B26 B27 B28 1 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1

13 Peano Count Tree (P-tree) P-tree represents RSI data bit-by-bit in a recursive quadrant-by-quadrant arrangement. P-trees are a lossless compressed representation of the original data.

14 An example 2-D a P-tree Quadrant-based, Pure (Pure-1/Pure-0) quadrant Peano or Z-ordering Root Count 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 39 168150 30414434 11100010110 1 16 0 39 0 4 444 158 11 10 300 10 1 11 3 0 1 11111111111111111110000011110010111111111111111111111111111111111111111111111111111000001111001011111111111111111111111111111111 bSQ file bSQ file arranged as a spatial dataset (2-D raster order)

15 Peano Mask Tree (PM-tree) Truth-Trees (1 if condition is true of quadrant, else 0 – E.g., Pure-1 and Pure-0 Trees – All are lossless compressed representations of the dataset

16 55 1681516 30414434 11100010110 1 Peano or Z-ordering Pure-1/Pure-0 quadrant Root Count  Level  Fan-out  QID (Quadrant ID) 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0123 111 ( 7, 1 ) ( 111, 001 ) 10.10.11 2 3 2. 2. 3 001

17 P-tree Operations P-tree 55 PM-tree m ______/ / \ \_______ ______/ / \ \______ / __ / \___ \ / __ / \ __ \ / / \ \ / / \ \ 16 __8____ _15__ 16 1 m m 1 / / | \ / | \ \ / / \ \ / / \ \ 3 0 4 1 4 4 3 4 m 0 1 m 1 1 m 1 //|\ //|\ //|\ //|\ //|\ //|\ 1110 0010 1101 1110 0010 1101 P-tree-1: m ______/ / \ \______ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ 1110 0010 1101 P-tree-2: m ______/ / \ \______ / / \ \ 1 0 m 0 / / \ \ 1 1 1 m //|\ 0100 AND-Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ 1101 0100 OR-Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 m 1 1 / / \ \ m 0 1 m //|\ //|\ 1110 0010 Complement 9 m ______/ / \ \_______ ______/ / \ \______ / __ / \___ \ / __ / \ __ \ / / \ \ / / \ \ 0 __8____ _1__ 0 0 m m 0 / / | \ / | \ \ / / \ \ / / \ \ 1 4 0 3 0 0 1 0 m 1 0 m 0 0 m 0 //|\ //|\ //|\ //|\ //|\ //|\ 0001 1101 0010 0001 1101 0010

18 Ptree ANDing Operation PM-tree1: m ______/ / \ \______ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ 1110 0010 1101 PM-tree2: m ______/ / \ \______ / / \ \ 1 0 m 0 / / \ \ 1 1 1 m //|\ 0100 Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ 1101 0100 0 100 101 102 12 132 20 21 220 221 223 23 3 & 0 20 21 22 231  RESULT 0 0  0 20 20  20 21 21  21 220 221 223 22  220 221 223 23 231  231 Depth-first Pure-1 path code

19 Various P-trees Basic P-trees P i, j Value P-trees P i (v) Tuple P-trees P(v 1, v 2, …, v n ) AND COMPLEMENT AND Interval P-trees P i (v 1, v 2 ) Cube P-trees P([v 11, v 12 ], …, [v N1, v N2 ]) OR AND AND, OR, COMPLEMENT AND, OR Predicate P-trees P(p) COMPLEMENT AND, OR, COMPLEMENT

20 Association Rule Mining on RSI Data using P-trees Admissible Itemsets (Asets ) – Asets are itemsets of the form, Int 1  Int 2 ...  Int n = Π i=1...n Int i, where Int i is an interval of values in Band i (some of which may be the full value range). – Example: Aset {[01,01] 1, [11,11] 2 } P-ARM algorithm Pruning techniques

21 P-ARM algorithm Procedure P-ARM { Data_Discretization; F 1 = {frequent 1-Asets}; For (k=2; F k-1  ) do begin C k = p-gen(F k-1 ); Forall candidate Asets c  C k do c.count = AND_rootcount(c); F k = {c  C k | c.count >= minsup} end Answer =  k F k } F 1 is determined directly from P-tree root counnts and pruning techniques rather than transaction database scan. The p-gen function differs from the apriori-gen function in Apriori by using some pruning techniques. The AND_rootcount function is used to calculate Aset counts directly by ANDing the appropriate basic P- trees instead of scanning the transaction databases. The support count for Aset {B1[0,64), B2[64,127)} (or {[00, 00] 1, [01, 01] 2 }) is the root count of P 1 (00) AND P 2 (01).

22 Pruning Techniques Band-based pruning – An itemset with two items from the same band will have support zero. Constraint-base pruning – E.g., specify yield as the only consequent band of interest. – Note: in the performance comparisons we did not use this pruning technique (to maintain fairness, since it is hard to implement in other alogrithms) Bit-based pruning for multi-level rules – if Aset [128,255] (or [1,1] 2 ) is not frequent, then the Aset [128,191] (or [10,10] 2 ) and [192,255] (or [11,11] 2 ) cannot be frequent either. Others

23 P-ARM versus Apriori Scalability with support threshold 1,742,400 pixels (transactions)

24 P-ARM versus Apriori (cont.) Scalability with number of transactions Support threshold =10%

25 P-ARM versus FP-growth Scalability with support threshold 0 100 200 300 400 500 600 700 800 10%30%50%70%90% Support threshold Run time (Sec.) P-ARM FP-growth 17,424,000 pixels (transactions) 1,742,400 pixels (transactions)

26 P-ARM versus FP-growth (cont.) Scalability with the number of transactions Support threshold =10%

27 Conclusion A model for association rule mining on RSI data – P-trees facilitate fast calculation of support – P-trees facilitates significant pruning techniques Applications other than precision agriculture – Flood prediction and monitoring – Community and regional planning – Virtual archeology – Mineral exploration – Bioinformatics/Genomics – VLSI design

28


Download ppt "Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North."

Similar presentations


Ads by Google