1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,

Slides:



Advertisements
Similar presentations
Visible-Surface Detection(identification)
Advertisements

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Addition Facts
Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
13.1 Vis_2003 Data Visualization Lecture 13 Visualization of Very Large Datasets.
Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj.
Squares and Square Root WALK. Solve each problem REVIEW:
Lecture 4 vector data analysis. 2014年10月11日 2014年10月11日 2014年10月11日 2 Introduction Based on the objects,such as point,line and polygon Based on the objects,such.
Addition 1’s to 20.
25 seconds left…...
Week 1.
February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa.
Rizwan Rehman Centre for Computer Studies Dibrugarh University
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
7/03Spatial Data Mining G Dong (WSU) & H. Liu (ASU) 1 6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Database Systems Tyler Reainthong CSE 7330 Fall 2009.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.
Spatial Mining.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Indexing SAMs.
Multi-dimensional Indexes
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
©Silberschatz, Korth and Sudarshan23.1Database System Concepts 1 Temporal and Spatial Data Transaction systems  Relational DB  OO DB  OR DB Decision.
Chapter 3: Data Storage and Access Methods
Techniques and Data Structures for Efficient Multimedia Similarity Search.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Multimedia Databases Chapter 4.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
Trees for spatial indexing
Trees for spatial data representation and searching
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Spatial Data Management Chapter 28.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University.
Indexing for Multidimensional Data An Introduction.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
M- tree: an efficient access method for similarity search in metric spaces Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Spatial and Geographic Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria University (Karachi Campus)
Spatial DBMS Spatial Database Management Systems.
A Quick Introduction to GIS
Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.
Spatial and Geographic Databases. Spatial databases store information related to spatial locations, and support efficient storage, indexing and querying.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Chapter 25: Advanced Data Types and New Applications
Spatial Indexing I Point Access Methods.
Multidimensional Indexes
Spatial Databases - Introduction
Spatial Indexing I R-trees
Spatial Databases - Introduction
File Processing : Multi-dimensional Index
Multidimensional Search Structures
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

1 DATA STRUCTURES USED IN SPATIAL DATA MINING

2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles, polygons, cubes and other geometric objects. Spatial data occupies a certain amount of space called its spatial extent, which is characterized by location and boundary. broadly be defined as data which covers multidimensional points, lines, rectangles, polygons, cubes and other geometric objects. Spatial data occupies a certain amount of space called its spatial extent, which is characterized by location and boundary. USES USES Geographic Information Systems. Geographic Information Systems. CAD/CAM It can CAD/CAM It can Multimedia Applications Multimedia Applications – Content based image retrieval – Content based image retrieval – Fingerprint matching – Fingerprint matching – MRI ( Digitized medical images) – MRI ( Digitized medical images)

3 Features of spatial data Specific features of spatial data are rich data types, implicit spatial relationships among the variables, observations that are not independent, spatial auto correlation among the features. Specific features of spatial data are rich data types, implicit spatial relationships among the variables, observations that are not independent, spatial auto correlation among the features. It has two distinct types of attributes i.e. spatial attributes, non spatial attributes. Spatial attributes are used to define the spatial locations and extend of spatial objects. It has two distinct types of attributes i.e. spatial attributes, non spatial attributes. Spatial attributes are used to define the spatial locations and extend of spatial objects.

4 Types of spatial databases Region Data: It has a spatial extent having a location and boundary. Region data basically is the geometric approximation to an actual database. Region Data: It has a spatial extent having a location and boundary. Region data basically is the geometric approximation to an actual database. Point Data: Point data consists of collection of points in a multidimensional space. It doesnt cover any area of space. Point Data: Point data consists of collection of points in a multidimensional space. It doesnt cover any area of space.

5 What is Spatial Data Mining? It is defined as the non-trivial search for interesting and unexpected spatial patterns from spatial databases. It is defined as the non-trivial search for interesting and unexpected spatial patterns from spatial databases. New understanding of geographic processes for critical questions like how is the health of planet Earth? Characterize effects of human activity on environment and ecology ? needs spatial data mining. New understanding of geographic processes for critical questions like how is the health of planet Earth? Characterize effects of human activity on environment and ecology ? needs spatial data mining.

6 Spatial data in GIS A geographic information system is any system for capturing, storing, analyzing and managing data and associated attributes which are spatially referenced to Earth. A geographic information system is any system for capturing, storing, analyzing and managing data and associated attributes which are spatially referenced to Earth. There are two broad methods used to store data in a GIS i.e. Raster and Vector. In a GIS, geographical features are often expressed as vectors, by considering those features as geometrical shapes like point, chains, polygons There are two broad methods used to store data in a GIS i.e. Raster and Vector. In a GIS, geographical features are often expressed as vectors, by considering those features as geometrical shapes like point, chains, polygons.

7 Spatial data structures used in GIS In order to handle spatial data efficiently, as required in computer aided design and geo- data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations. Quad tree Quad tree k-d tree k-d tree R-tree R-tree R+-tree R+-tree R*-tree R*-tree

8 Quad trees It is used to store 2D space. It is used to store 2D space. Each node of a quad tree is associated with a rectangular region of space. Each node of a quad tree is associated with a rectangular region of space. The top node is associated with the entire target space. The top node is associated with the entire target space. Each internal node splits the space into four disjunct sub spaces according to the axes. Each internal node splits the space into four disjunct sub spaces according to the axes. Each of these sub spaces is split recursively until there is at most one object inside each of them. Each of these sub spaces is split recursively until there is at most one object inside each of them.

9 Division of space by quadtree

10 k-d Trees k-d Trees A k-d tree partitions the space into two sub spaces according to one of the coordinates of the splitting points. A k-d tree partitions the space into two sub spaces according to one of the coordinates of the splitting points. Let level(nod) be the length of the path from the root to the node nod and suppose the axes are numbered from 0 to k 1. At the level level(nod) in every node the space is split according to the coordinate number (level(nod) mod k). Let level(nod) be the length of the path from the root to the node nod and suppose the axes are numbered from 0 to k 1. At the level level(nod) in every node the space is split according to the coordinate number (level(nod) mod k). The partitioning is done along one dimension at the node at the top level of the tree, along another dimension in nodes at the next level and so on, cycling through the dimensions. The partitioning is done along one dimension at the node at the top level of the tree, along another dimension in nodes at the next level and so on, cycling through the dimensions.

11 Division of space by a k-d tree

12 R-Trees It is a balanced tree structure with the index objects stored in leaf nodes. The structure is completely dynamic with no need for intermittent restructuring. If M is the maximum number of entries in one node and m = M/2. Then m specifies the minimum number of entries allowed in a node except for the root.

13 Continue… Every non-leaf node has between m and M children unless it is the root. unless it is the root. The root node has at least two children unless it is a leaf. The root node has at least two children unless it is a leaf. For each index record (I, tuple-id) in a leaf node, I is the smallest rectangle that spatially contains the n dimensional data object. For each index record (I, tuple-id) in a leaf node, I is the smallest rectangle that spatially contains the n dimensional data object. For each (I, child-ptr) entry in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child nodes. For each (I, child-ptr) entry in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child nodes.

14 Division of space by R-trees

15 R+-tree It is an extension of R-tree. Here bounding rectangle of nodes at one level do not overlap. This feature decreases the number of searched branches of the tree and reduces the time consumption and increases the space consumption. Here the data objects are allowed to split so that different parts of one object can be stored in more nodes of one tree level.

16 Continue… Root has at least two children unless it is a leaf. Root has at least two children unless it is a leaf. All leaves are at same level. All leaves are at same level. There is no constraint on the minimum number of entries at each node. There is no constraint on the minimum number of entries at each node.

17 Division of space by R+-tree

18 R*-tree R*-tree is a modification of R–tree. R–tree tries to minimize the area of all nodes of the tree. R*-tree is a modification of R–tree. R–tree tries to minimize the area of all nodes of the tree. But R*–tree combines more criteria: But R*–tree combines more criteria: the area covered by a bounding rectangle the area covered by a bounding rectangle the margin of a rectangle: Minimization of the margin of a bounding rectangle prefers the squares. the margin of a rectangle: Minimization of the margin of a bounding rectangle prefers the squares. the overlap between rectangles: Minimization of the overlap between rectangles decreases the number of paths that must be searched the overlap between rectangles: Minimization of the overlap between rectangles decreases the number of paths that must be searched

19 Conclusion New techniques are needed for SDM due to New techniques are needed for SDM due to spatial auto correlation, continuity of space. Indexing structures discussed above are very much useful for spatial data represented in vector space. For metric spaces M-tree, Vp-tree, mvp-tree are used.The main aim of all these indexing structures is to minimize disk access. spatial auto correlation, continuity of space. Indexing structures discussed above are very much useful for spatial data represented in vector space. For metric spaces M-tree, Vp-tree, mvp-tree are used.The main aim of all these indexing structures is to minimize disk access.

20 References Spatial datamining.pdf R+-tree.pdf Data structure for spatial data mining21.pdf

21 THANK YOU

22 ??? QUERIES ???