Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Aspects in Spatial Data Mining Vania Bogorny.

Similar presentations


Presentation on theme: "Semantic Aspects in Spatial Data Mining Vania Bogorny."— Presentation transcript:

1 Semantic Aspects in Spatial Data Mining Vania Bogorny

2 2 Introduction Existing approaches for spatial data mining, in general, do not make use of prior knowledge Bogorny (2006) and Bogorny (2007) introduced the idea of using background knowledge –in data preprocessing, to reduce spatial joins –In spatial association rule mining, to eliminate well known patterns

3 3 Main Problems Unnecessary spatial relationship computation large amounts of association rules Many associations are well known natural geographic dependences Existing approaches for mining SAR are Apriori-like –Most approaches do not make use of background knowledge –Use syntactic constraints for frequent set and rule prunning –Only the data is considered, not the schema Result –Same associations explicitly represented in the schema (database designer) are extracted by SAR mining algoritms

4 4 Spatial Relationships (Gutting, 1994) B C A BnorthA CsoutheastA B C A BnorthA CsoutheastA Order Distance C B d C B d B A touches A B overlaps B inside A contains B A crosses A B equals A B B A touches A B overlaps inside A contains A B disjoint A B disjoint B A crosses A B equals A B Topological

5 5 Order and Distance relationships may exist among any pair of spatial features Topological Relations depend on the geometry Topological Relationships – GEOMETRICALLY POSSIBLE OGC standard

6 6 Topological Relationships – SEMANTICALLY CONSISTENT

7 7 Spatial Relationships – Mandatory (Spatial constraints) Dependences: – Prohibited: – Possible: Normally undefined Road crosses River For Data mining and knowledge discovery, POSSIBLE and PROHIBITED RELATIONSHIPS are interesting!!!! All others are well known.

8 8 Well Known Geographic Dependences Non-obvious spatial relationships Well known dependences Is_a(gasStation)  intersects(street) (100%) Is_a(island)  within (waterResource) (100%)

9 9 Well Known relationshiops X Association Rules Bridge &Viaducts Roads Vegetation contains(gasStation)  contains(Street) (100%) Bus Stop Street intersects(busStop)  intersects(Street) (100%) Contains(viaduct)  contains(road) (100%)

10 10 {State, Country} {Factory, County} {Island, WaterBody} …. Well Known Associations – Conceptual Schemas

11 11 Well Known Associations – Conceptual Schemas Fonte: 1ª Divisão de Levantamento do Exército Brasileiro

12 12 Well Known Associations – Conceptual Schemas

13 13 Well Known Associations – Geo-Ontologies 1 Geographic dependences are explicit in geo-ontologies (Bogorny, 2005b)

14 14 Well Known Associations – Geo-Ontologies

15 15 Well known dependences X Spatial Association Rules (SAR) Well knonw dependences affect the 3 main steps in the process of mining SAR: –Spatial predicate computation: compute unnecessary relationships –Frequent set generation: generate frequent itemsets with well known patterns –Association rule extraction: produce a high number of rules with well known dependences

16 Well Known Dependences in SAR

17 17 Example of preprocessed spatial dataset

18 18 Problem 1 – Geographic Dependences between the Target Feature Type and Relevant Feature Types Minconf=70% Dependence = City and TreatedWaterNet 100% de support TreatedWaterNet contains(Hospital)  contains(TreatedWaterNet)

19 19 Problem 2 - Dependences among Relevant Feature Types Minsup=50% 25 frequent sets(6 contain the dependence) 9 closed frequent sets (3 have the dependence) Dependence = {Port, WaterBody} PortWaterBody contains(Port)  crosses(WaterBody)

20 Pruning Methods using Background Knowledge

21 21 Frequent Set Pruning (Apriori-KC) (Bogorny, 2006ª) Given: , // set of knowledge constraints , // dataset generated with spatial_predicate_extraction minsup, // minimum support L 1 = {large 1-predicate sets}; For ( k = 2; L k-1 !=  ; k++ ) do begin C k = apriori_gen(L k-1 ); // Generates new candidates If (k=2) // remove pairs with dependences (step 1) Delete from C 2 all pairs with a dependence in  ; Forall rows w   do begin C w = subset(C k, w); // Candidates contained in w Forall candidates c  C w do c.count++; End; L k = {c  C k | c.count  minsup}; End; Answer =  k L k

22 22 Understanding the Pruning Methods Dependences {D} e {A,W} D = contains(TreatedWaterNet)

23 23 Understanding the Pruning Methods (Input Pruning) {C,D,T} {A,C,D,W} {D} {A,D}{C,D} {A,C,D}{A,D,T}{A,D,W} {D,T}{D,W} {C,D,W} {D,T,W} {A,D,T,W} {} {C} {A}{T} {A,C}{A,T}{A,W}{C,T} {A,C,W} {W} {C,W} {T,W} {A,T,W} {D} Input pruning

24 24 Understanding the Pruning Methods (Frequent set pruning) 25 frequent sets {C,D,T} {A,C,D,W} {D} {A,D}{C,D} {A,C,D}{A,D,T}{A,D,W} {D,T}{D,W} {C,D,W} {D,T,W} {A,D,T,W} {} {C} {A}{T} {A,C}{A,T}{A,W}{C,T} {A,C,W} {W} {C,W} {T,W} {A,T,W} {A,W} Frequent set pruning

25 Percentage reduction of association rules considering zero (reference), one, and two pairs of dependences with an increasing number of elements (predicates) 25 minconf=0

26 26 Problem 3 – Redundant Frequent Itemsets 9 closed frequent itemsets - Considering the 25 frequent sets in the example dataset - 9 are closed frequent itemsets (3 contain the dependence) -16 are redundant (3 contain the dependence) Dependence {A,W} Tid (city) Predicate Set 1 A, C,D,T, W 2 C, D, W 3 A, D, T, W 4 A, C, D, W 5 A, C, D, T, W 6 C, D, T Dataset

27 27 Problem 3 – Redundant Frequent Itemsets Remove dependences and then generate closed frequent itemsets Problem –> resultant frequent sets are not closed Tid (city) Predicate Set 1 A, C,D,T, W 2 C, D, W 3 A, D, T, W 4 A, C, D, W 5 A, C, D, T, W 6 C, D, T {A,D,T,W}

28 28 Problem 3 – Redundant Frequent Itemsets 9 closed frequent itemsets - Generate closed frequent itemsets and then eliminate dependences Problem – loose information Dependence {A,W}

29 29 Max-FGP (Bogorny 2006c) - Remove dependences in a first step

30 30 Max-FGP -Remove redundant frequent sets in a second step  generating maximal frequent sets (135) {} {C}{D}{A}{T} {A,C}{A,D}{A,T}{C,D}{C,T} {A,C,D}{A,D,T} {W} {C,W} {D,T}{D,W}{T,W} {C,D,T}{C,D,W} {D,T,W} (1345)(12456)(123456)(1356)(12345) (145) (1345) (135) (12456)(156)(12456)(1356) (12345) (145)(135)(156) (1245) (135) {} {C}{D}{A}{T} {A,C}{A,D}{A,T}{C,D}{C,T} {A,C,D}{A,D,T} {W} {C,W} {D,T}{D,W}{T,W} {C,D,T}{C,D,W} {D,T,W} (1345)(12456)(123456)(1356)(12345) (145) (1345) (135) (12456)(156)(12456)(1356) (12345) (145)(135)(156) (1245) (135)

31 31 Max-FGP (Bogorny, 2006c) L; Given: L; // frequent sets without dependences (Apriori-KC)  ; // dataset generated with spatial_predicate_extraction Find: Maximal M // find maximal generalized predicate sets M = L; For ( k = 2; M k !=  ; k++ ) do begin For ( j = k+1; M j !=0; j++ ) do begin If (tidSet (M k ) = tidSet (Mj)) If (M k  M j ) // M j is more general than M k Delete M k from M; End; Answer = M;

32 32 Some results on real databases

33 33 Input Space Pruning 20 predicates 1 dependence 50% 2 dependences 70% 1 dependence 70% 2 dependences 90%

34 34 Frequent Set Pruning 17 7 11 77% 68% 58% 15 predicates

35 35 Summary Well known dependences exist in several non-spatial application domains –Biology/Bioinformatics –Pregnant  Female (confidence=100%) –Breast_cancer  Female (confidence 100%) –... Almost no data mining approaches consider background knowledge or domain knowledge

36 36 Future Tendences Data Mining methods will consider semantics 3 workshops (KDD and ICDM) for domain-driven data mining ICDM 2008,2009 Workshop - Semantic Aspects in Data Mining Book 2008: Domain-Driven Data Mining

37 37 Summary: Mining SAR using Background Knowledge Using background knowledge: –To prune the input space as much as possible (applicable to any SDM method) –Apriori-KC  generate frequent itemsets without well known dependences –Max-FGP (Maximal Frequent Geographic Patterns)  generate closed frequent itemsets without well known dependences

38 38 References Bogorny, V.; Valiati, J.; Camargo, S.; Engel, P.; Alvares, L. O.: Mining Maximal Generalized Frequent Geographic Patterns with Knowledge Constraints. In: IEEE International Conference on Data Mining, IEEE-ICDM, 6., 2006, Hong-Kong, 2006c Bogorny, V.; Camargo, S.; Engel, P. M.; Alvares, L.O. Towards elimination of well known geographic domain patterns in spatial association rule mining. In: IEEE International Conference on Intelligent Systems, IEEE-IS, 3., 2006, London. IEEE Computer Society, 2006b. p. 532-537. Bogorny, V.; Camargo, S.; Engel, P.; Alvares, L. O.: Mining Frequent Geographic Patterns with Knowledge Constraints. In: ACM International Symposium on Advances in Geographic Information Systems, ACM-GIS, 14., 2006, Arlington. p. 139-146a. Bogorny, V.; Palma, A; Engel. P. ; Alvares, L.O. Weka-GDPM: Integrating Classical Data Mining Toolkit to Geographic Information Systems. In: SBBD Workshop on Data Mining Algorithms and Applications, WAAMD, Florianopolis, 2006 d.p. 9-16.

39 39 References Bogorny, V.; Engel, P. M.; Alvares, L.O. Enhancing the Process of Knowledge Discovery in Geographic Databases using Geo-Ontologies. In: NIGRO, H. O.; CISARO, S.G.; XODO, D. (Ed.). Data Mining with Ontologies: Implementations, Findings, and Frameworks. Idea Group, 2007. CLEMENTINI, E.; DI FELICE, P.; KOPERSKI, K. Mining multiple-level spatial association rules for objects with a broad boundary. Data & Knowledge Engineering, [S.l.], v.34, n.3, p.251-270, Sept. 2000. GUTING, R. H. An Introduction to Spatial Database Systems. The International Journal on Very Large Data Bases, [S.l.], v.3, n.4, p. 357 – 399, Oct. 1994. KOPERSKI, K.; HAN, J. Discovery of Spatial Association Rules In Geographic Information Databases. In: INTERNATIONAL SYMPOSIUM ON LARGE GEOGRAPHICAL DATABASES, SSD, 4., 1995, Portland. Proceedings… [S.l.]: Springer, 1995. p.47-66. MENNIS, J.; LIU, J.W. Mining Association Rules in Spatio-Temporal Data: An Analysis of Urban Socioeconomic and Land Cover Change. Transactions in GIS, [S.l.], v.9, n.1, p. 5-17, Jan. 2005. OPEN GIS CONSORTIUM. OpenGIS simple features specification for SQL. 1999. Available at. Visited on Aug. 2005.http://www.opengeospatial.org/docs/99-054.pdf


Download ppt "Semantic Aspects in Spatial Data Mining Vania Bogorny."

Similar presentations


Ads by Google