Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko.

Slides:



Advertisements
Similar presentations
AI Pathfinding Representing the Search Space
Advertisements

Concurrency Control Part 2 R&G - Chapter 17 The sequel was far better than the original! -- Nobody.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Multidimensional Indexing
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
CONNECTIVITY “The connectivity of a network may be defined as the degree of completeness of the links between nodes” (Robinson and Bamford, 1978).
Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.
Fractional Cascading CSE What is Fractional Cascading anyway? An efficient strategy for dealing with iterative searches that achieves optimal.
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
2-dimensional indexing structure
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Index Sen Zhang. INDEX When a table contains a lot of records, it can take a long time for the search engine of oracle (or other RDBMS) to look through.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Hierarchical Constraint Satisfaction in Spatial Database Dimitris Papadias, Panos Kalnis And Nikos Mamoulis.
Chapter 3: Data Storage and Access Methods
Mark Waitser Computational Geometry Seminar December Iterated Snap Rounding.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Moving Objects Databases Nilanshu Dharma Shalva Singh.
Query Processing Presented by Aung S. Win.
Students: Nadia Goshmir, Yulia Koretsky Supervisor: Shai Rozenrauch Industrial Project Advanced Tool for Automatic Testing Final Presentation.
Spatial Database Souhad Daraghma.
Trees for spatial data representation and searching
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Crossing Minimisation (1) Ronald Kieft. Global contents Specific 2-layer Crossing Minimisation techniques After the break, by Johan Crossing Minimisation.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Database Management 9. course. Execution of queries.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database 4.5 million items, each having: –1+ names fair to good discriminator –1 geospatial.
Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.
Adapted from Mike Franklin
A Study of Balanced Search Trees: Brainstorming a New Balanced Search Tree Anthony Kim, 2005 Computer Systems Research.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
Optimizing Multidimensional Index Trees for Main Memory Access Author: Kihong Kim, Sang K. Cha, Keunjoo Kwon Members: Iris Zhang, Grace Yung, Kara Kwon,
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Bushy Binary Search Tree from Ordered List. Behavior of the Algorithm Binary Search Tree Recall that tree_search is based closely on binary search. If.
February 17, 2005Lecture 6: Point Location Point Location (most slides by Sergi Elizalde and David Pritchard)
CS4432: Database Systems II More on Index Structures 1.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
21 Copyright © 2008, Oracle. All rights reserved. Enabling Usage Tracking.
CS4432: Database Systems II
DB Index Expert Copyright © SoftTree Technologies, Inc.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
SERIALIZED DATA STORAGE Within a Database James Devens (devensj)
A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
CMPT 438 Algorithms.
Spatio-Temporal Databases
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
CS 540 Database Management Systems
B-Trees B-Trees.
Chapter 25: Advanced Data Types and New Applications
Datastructure.
RE-Tree: An Efficient Index Structure for Regular Expressions
Methodology – Physical Database Design for Relational Databases
國立臺北科技大學 課程:資料庫系統 fall Chapter 18
Spatio-Temporal Databases
Chapter III Modeling.
Indexing 4/11/2019.
Query Optimization Techniques
Presentation transcript:

Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project June 17,2012 Students: Michael Tsalenko & Belal Kabat Supervisor: Flora Gilboa

Introduction Research ways of inserting and querying geographical information related to video represented as polygons or circles into geographic DB.

Goals Finding the optimal way of inserting Geo- data by determining the best representation of the captured polygon. What representation is the optimal when querying the Data Base. Determine whether amount of vertices of the polygon affect the querying efficiency.

Methodology: Main Steps Data creation Reading data into memory (from file system). Measuring insertion times. Measuring querying times. Adding tables with index using R-tree. Parallelizing insertion and querying. Statistical examination of the results.

Methodology- Data creation (1): Methodology- Data creation (1): We chose 10 different locations, and randomly created 10,000 quad polygons around each location. Our data set consists of 100,000 polygons in total.

Methodology- Data creation (2): Methodology- Data creation (2): Snapshot of the random data A closer look at Haifa

Methodology- Data creation (3): Methodology- Data creation (3): Intuitively circle data manipulation should be more efficient than quad data as it has less fields (3 vs. 8). So we bounded each quad of the 100,000 with a circle and later improved it to a minimal bounding circle (to compare between different data representations of same locations). ( , ), ( , ), ( , ), ( , ) (( , ), ( ))

Methodology- Reading into memory In order to improve the insertion time we have created a pool which included all the data, so when we insert the data to the data-base we read immediately from the pool and not from the files, to make as less operations as possible after the connection to the server was made.

Methodology- Measuring insertion time. We have created 2 tables, one for quad polygons and one for circles. For instance, the circles table: idAsset nameArea Circle#1 Circle#2 Circle#3 (( , ), ( )) (( , ), ( )) (( , ), ( ))

Methodology- Measuring insertion and querying time. We started to insert 100,000 objects to each table, but after few hours of running we stopped the process (single thread), and reduced the data to 10,000 objects, the difference between the times of both insertion and querying was not significant, so we decided to check with indexed tables* for both circles and polygons. (and started to plan parallelizing the research due to the 100,000 objects insertion inefficiency)

Methodology- Measuring insertion time. Indexed tables: Are tables which are built using R-tree Data structure, which is a tree data structure used for spatial access methods, similar to the B-tree, R-tree is also a balanced search tree, so all leaf nodes at the same height.

Methodology- Measuring insertion time. The difference between the results of the querying and insertion to indexed and non-indexed tables were not significant: (*) Hybrid circle in polygon means the query was to select a circle intersections in polygons table, and Hybrid polygon in circle means the opposite.

Parallelizing inserting. We created 10 threads, each thread inserted 10,000 objects to each one of the 2 indexed tables (quads, circles). This parallelizing enabled us to insert 100,000 objects in a reasonable time (~15mins instead of more than 2 hours when single thread). We stopped inserting into the non-indexed tables since we saw that indexed tables give better results. We can see that the circles insertion is better than quads insertion by ~13%.

Parallelizing querying. We created 10 threads, each thread inserted 10,000 object to each one of the 4 tables (quads, circles) The difference between the results of the tables were not significant.

Methodology- vertices amount In order to check if the number of vertices in a polygon makes a difference, we have converted each polygon to an octagon in the following way: We added a vertex in the middle of each edge of original quadrate and added a factor of 0.001, which will prevent DB vendor optimizations

Methodology- vertices amount Insertion results: We can see that the insertion to octagon is worse than to the quad polygons by ~7% and to the cycles by ~18%

Methodology- vertices amount Queries results: The results show that there is no significant difference between the querying to the tables. But it may be expected that on average intersecting a circle with quad polygons geo-data yields the best results.

Achievements & Conclusions: We helped the IBM researchers to decide that the best way to represent the geo-data (for both insertion and querying) is just to keep its original representation (no mistake areas) – as long as it is up to 8 vertices (or a circle), and if it has more than 8 vertices – further research is needed. There is no need to look for other heuristics for data manipulation (bounding or bounded circles, dividing into several geo- objects, etc…).

Achievements & Conclusions: We learned about: ◦ The importance of making a good work plan – founding the basics, ensure all works and building on them the needed extensions. ◦ How the indexed tables work ◦ Using JDBC with geodetic extension ◦ Statistical tools and analyzing results ◦ It’s very helpful to have a supervisor who is an expert in the project field and can direct us properly and save lots of time.