Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geospatial Data Mining at University of Texas at Dallas Dr. Bhavani Thuraisingham (Computer Science) Dr. Latifur Khan (Computer Science) Dr. Fang Qiu (GIS)

Similar presentations


Presentation on theme: "Geospatial Data Mining at University of Texas at Dallas Dr. Bhavani Thuraisingham (Computer Science) Dr. Latifur Khan (Computer Science) Dr. Fang Qiu (GIS)"— Presentation transcript:

1 Geospatial Data Mining at University of Texas at Dallas Dr. Bhavani Thuraisingham (Computer Science) Dr. Latifur Khan (Computer Science) Dr. Fang Qiu (GIS) Students Shaofei Chen (GIS) Mohammad Farhan (CS) Shantnu Jain (GIS), Lei Wang (CS) Post Doc: Dr. Chuanjun Li This Research is Partly Funded by Raytheon

2 Outline l Ontology-driven Modeling and Mining of Geospatial Data - Ontology l Case Study: Dataset l Aster Dataset l Process of Our Approach - SVM Classifiers - Region Growing - Graph of Regions: Near Neighboring Regions - Ontology Driven Rule Mining l High Level Concept Detection l Output l Related Work l Future Work

3 Ontology-Driven Modeling and Mining of Geospatial Data - Ontology will be represented as a directed acyclic graph (DAG). Each node in DAG represents a concept - Interrelationships are represented by labeled arcs/links. Various kinds of interrelationships are used to create an ontology such as specialization (Is-a), instantiation (Instance-of), and component membership (Part-of). Residential Apartment Single Family Home Multi-family Home IS-A Urban Part-of

4 Ontology-Driven Modeling and Mining of Geospatial Data l We will develop domain-dependent ontologies - Provide for specification of fine grained concepts - USGS taxonomy can be extended by adding concepts to facilitate finer grained classification - Concept, “Residential” can be further categorized into concepts, “Apartment”, “Single Family House” and “Multi- family House” l Generic ontologies provide concepts in coarser grain

5 Case Study: Dataset l ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) - To obtain detailed maps of land surface temperature, reflectivity and elevation. l ASTER obtains high-resolution (15 to 90 square meters per pixel) images of the Earth in 14 different wavelengths of the electromagnetic spectrum, ranging from visible to thermal infrared light. l ASTER data is used to create detailed maps of land surface temperature, emissivity, reflectivity, and elevation.

6 Case Study: Dataset & Features l Remote sensing data used in this study is ASTER image acquired on 31 December 2005. - Covers northern part of Dallas with Dallas-Fort Worth International Airport located in southwest of the image. l ASTER data has 14 channels from visible through the thermal infrared regions of the electromagnetic spectrum, providing detailed information on surface temperature, emissive, reflectance, and elevation. l ASTER is comprised of the following three radiometers : - Visible and Near Infrared Radiometer (VNIR --band 1 through band 3) has a wavelength range from 0.56~0.86μm.

7 Case Study: Dataset & Features l Short Wavelength Infrared Radiometer (SWIR-- band 4 through band 9) has a wavelength range from 1.60~2.43μm. - Mid-infrared regions. Used to extract surface features. l Thermal Infrared Radiometer (TIR --band 10 through band 14) covers from 8.125~11.65μm. - Important when research focuses on heat such as identifying mineral resources and observing atmospheric condition by taking advantage of their thermal infrared characteristics.

8 ASTER Dataset: Technical Challenges l Testing will be done based on pixels l Goal: Region-based classification and identify high level concepts l Solution - Grouping adjacent pixels that belong to same class - Identify high level concepts using ontology-based rule mining

9 Process of Our Approach Testing Image Pixels SVM Classifier Region Growing Shortest Path Tree Ontology Driven Rule Mining Classified Pixels Graph of Regions Graph of Near Neighboring Regions High Level Concept Training Image Pixels

10 Process of Our Approach Testing Image Pixels SVM Classifier Region Growing Shortest Path Tree Ontology Driven Rule Mining Classified Pixels Graph of Regions Graph of Near Neighboring Regions High Level Concept Training Image Pixels

11 SVM Classifiers: Atomic Concepts

12 Classes Train set Test set-1Test set-2 Water 85,463 7,520210 Bare Lands 11,767 1,7461,043 Grass 452 830568 Forests 3,153 438652 Buildings 668 19350 Open Places 1,503 297888 Roads 70 16689 # of instances 1,14,392 11,1903,500 Different Class Distribution of Training and Test Set

13 SVM Classifiers: Atomic Concepts ClassifierTest set-1Test set-2 ML90%40.09% SVM-Linear91%67.5% SVM-Polynomial89.3%50.7% SVM-RBF89.6%54.7% Accuracy of Various Classifiers

14 Process of Our Approach Testing Image Pixels SVM Classifier Region Growing Shortest Path Tree Ontology Driven Rule Mining Classified Pixels Graph of Regions Graph of Near Neighboring Regions High Level Concept Training Image Pixels

15 Region Growing

16

17

18

19 Process of Our Approach Testing Image Pixels SVM Classifier Region Growing Shortest Path Tree Ontology Driven Rule Mining Classified Pixels Graph of Regions Graph of Near Neighboring Regions High Level Concept Training Image Pixels

20 Graph of Regions: Near Neighbor Regions l After region growing - We generate a graph by treating each region as a node - Distance between two regions as edge between two nodes. l Generate Shortest Path Tree (SPT) of this graph for each source. - Near Neighboring regions will be determined

21 Shortest Path Tree ………

22 Process of Our Approach Testing Image Pixels SVM Classifier Region Growing Shortest Path Tree Ontology Driven Rule Mining Classified Pixels Graph of Regions Graph of Near Neighboring Regions High Level Concept Training Image Pixels

23 Ontology Driven Rule Mining RootNode CountrySideCityDeepForest GrassForestBareLandRoadBuildingWaterOpenPlaces Athletic FieldGarden Park Water Cross LakeReservoir

24 Ontology-Driven Modeling and Mining of Geospatial Data l Ontology-based Pruning and Retrieval: - Ontology will facilitate mining of information at various level of abstraction. - Using ontology and a set of atomic concepts we will infer a set of high level concepts (i.e., apartment, single family house, multi-level house). l We will exploit the possible influence relations between concepts based on the given ontology hierarchy.

25 Ontology-Driven Modeling and Mining of Geospatial Data - To determine or to improve the accuracy of high level concept classifier learning, two forms of influence are taken into consideration: boosting, and confusion. l Boosting factor is Co-occurrence of regions based on topology (spatial relationship) such as adjacency, connectivity, orientation, hierarchy, or combinations thereof embedded in the ontology. For a certain concept, “City”, specific concepts “Building,” “Road” and “Open Space” will co-exist. l Confusion factor is the influence between concepts that cannot be coexistent.

26 Rules: From Ontology l Class(A1)=Building ^ Class(A2) = Road ^ Class(A3) =Open Place ^ NextTo (A1,A2, Distance) ^ NextTo (A2, A3, D)=> City (A1 U A2 U A3) l Class(A1)=Forest ^ Class(A2)=Water ^ Class(A3) =Bare Land ^ NextTo (A1,A2, Distance) ^ NextTo (A2, A3, D)=> Deep Forest (A1 U A2 U A3) l Class(A1)=Forest ^ Class(A2)=Water ^ NextTo (A1,A2, D)=> Deep Forest(A1 U A2) l Class(A1)=Forest ^ Class(A2)=Bare Land ^ NextTo(A1,A2,D)=> Deep Forest(A1 U A2) l Class(A1)=Building ^ Class(A2)=Open Place ^ NextTo(A1,A2,D)=> City (A1 U A2) Note that D is for Distance; Ai is a Region & Class (Ai)= Concept of the Region

27 Ontology Driven Rule Mining: Psudocode

28 Implementation l Software: - ArcGIS 9.1 software. - For programming, we use Visual Basic 6.0 embedded in the software. l As of Today - 8 rules - Two levels Taxonomy

29 Output:Training set

30 Output:Test set

31 Output:City Concept

32 Output:Deep Forest Concept

33 Related Work l Classification - ML l Wilson, Gina M. 2004. Landcover classification of the City of Rocks, National Reserve using ASTER satellite imagery. Upper Columbia Basin Network, Inventory and Monitoring Program. Project Number UCBN-000001, National Park Service. Moscow, ID. 19 Pages. - SVM l Farid Melgani, Lorenzo Bruzzone, Classification of hyperspectral remote- sensing images with support vector machines. l Zhu, G. and D.G. Blumberg. (2002). Classification using ASTER data and SVM algorithms - The case study of Beer Sheva, Israel. l Huang C.; Davis L. S.; Townshend J. R. G. (2002) An assessment of support vector machines for land cover classification.

34 Rules: From Ontology l Technical Challenges - Sparse Test Dataset l Difficult to determine adjacency - Size of Area should be included in Rules - Finer grain classification is required l Concepts like Lake, River Rather than Water Concept - Ordering of Rules will play a role

35 Future Work l Develop Full Fledged Prototype (By January 31, 2007) l Improve Accuracy of SVM classification (By January 31, 2007) - Hierarchical SVM l Generate Rules automatically (By June 30, 2007) - Ripper –Semi-automatically - Association Rule mining

36 Water 23392000100 Bare Lands 03685105133 Grass 01040776020 Forests 35461022120 Buildings 014321820 Open Places 0131546387 Roads 00000663 Water Bare Lands Grass Forests Buildings Open Places Roads Predicted Actual Confusion Matrix (7 Classes)

37 Observations: Hierarchical SVM l Different Classes have different true recognition rates (TR) and different false recognition rates (FR) l If there is one class for which TR is HIGH and FR is LOW: - Classification to this class can be accepted with high confidence - Classes with low TR and high FR can be considered for a NEW and possibly better classifier

38 Bare Lands 3685105133 Grass 1040776020 Forests 5461022120 Buildings 14321820 Open Places 131546387 Roads 0000663 Bare Lands Grass Forests Buildings Open Places Roads Predicted Actual Confusion Matrix (6 Classes)

39 Bare Lands 368510533 Grass 104077620 Forests 546102220 Open Places 13156387 Roads 000663 Bare Lands Grass Forests Open Places Roads Predicted Actual Confusion Matrix (5 Classes)

40 l Suppose k classes l ONE multi-class Classifier - Originally k(k-1)/2 binary SVMs K(k-1)/2 binary SVMs Class 1 …… Class 2 Class 3 Class k Class with HIGH TR and LOW FR

41 l Suppose k classes l ONE multi-class Classifier - Originally k(k-1)/2 binary SVMs - Then (k-1)(k-2)/2 binary SVMs K(k-1)/2 binary SVMs Class 1 …… (k-1)(k-2)/2 binary SVMs Class 2 Class 3 Class k …… Class 2 Class 3 Class k … First Classifier: Second Classifier: High TR and Low FR

42 Challenges: Hierarchical SVM l Same set of parameters will not yield the same classification rates for classifiers at different levels l Classification accuracy might not be sensitive to parameters l How to achieve High TR and Low FR for some classes?


Download ppt "Geospatial Data Mining at University of Texas at Dallas Dr. Bhavani Thuraisingham (Computer Science) Dr. Latifur Khan (Computer Science) Dr. Fang Qiu (GIS)"

Similar presentations


Ads by Google