Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.

Slides:



Advertisements
Similar presentations
P2PR-tree: An R-tree-based Spatial Index for P2P Environments ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA University of Tokyo.
Advertisements

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
Hadi Goudarzi and Massoud Pedram
A Centralized Scheduling Algorithm based on Multi-path Routing in WiMax Mesh Network Yang Cao, Zhimin Liu and Yi Yang International Conference on Wireless.
SLA-Oriented Resource Provisioning for Cloud Computing
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Indexing and Range Queries in Spatio-Temporal Databases
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Multidimensional Indexing
An Efficient Multi-Dimensional Index for Cloud Data Management Xiangyu Zhang Jing Ai Zhongyuan Wang Jiaheng Lu Xiaofeng Meng School of Information Renmin.
Indexing Network Voronoi Diagrams*
A Dynamic Binary Hash Scheme for IPv6 Lookup Q. Sun 1, X. Huang 1, X. Zhou 1, and Y. Ma 1,2 1. School of Computer Science and Technology 2. Beijing Key.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
Efficient Join Processing over Uncertain Data - By Reynold Cheng, et all. Presented By Lydia & Usha.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
1 Indexing Large Trajectory Data Sets With SETI V.Prasad Chakka Adam C.Everspaugh Jignesh M.Patel University of Michigan Presented by Guangyue Jia.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Self-Adaptive QoS Guarantees and Optimization in Clouds Jim (Zhanwen) Li (Carleton University) Murray Woodside (Carleton University) John Chinneck (Carleton.
HeteroPar 2013 Optimization of a Cloud Resource Management Problem from a Consumer Perspective Rafaelli de C. Coutinho, Lucia M. A. Drummond and Yuri Frota.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
1 SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices Gangyi Zhu, Yi Wang, Gagan Agrawal The Ohio State University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Indexing for Multidimensional Data An Introduction.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
Benjamin AraiUniversity of California, Riverside Reliable Hierarchical Data Storage in Sensor Networks Song Lin – Benjamin.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
RELAXED REVERSE NEAREST NEIGHBORS QUERIES Arif Hidayat Muhammad Aamir Cheema David Taniar.
Resource Addressable Network (RAN) An Adaptive Peer-to-Peer Substrate for Internet-Scale Service Platforms RAN Concept & Design  Adaptive, self-organizing,
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Page 1 MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services Shoji Nishimura (NEC Service Platforms Labs.), Sudipto Das,
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research.
CS522 Advanced database Systems Huiping Guo Department of Computer Science California State University, Los Angeles 3. Overview of data storage and indexing.
Dense-Region Based Compact Data Cube
Mehdi Kargar Department of Computer Science and Engineering
CPS216: Data-intensive Computing Systems
Multiway range trees: scalable IP lookup with fast updates
Spatial Indexing I Point Access Methods.
Chapter 15 QUERY EXECUTION.
What is the Azure SQL Datawarehouse?
Spatial Online Sampling and Aggregation
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Multidimensional Indexes
Database Design and Programming
How Yahoo! use to serve millions of videos from its video library.
File Processing : Multi-dimensional Index
Prof. R. Bayer, Ph.D. Dr. Volker Markl
Presentation transcript:

Reporter : Yu Shing Li 1

 Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant Nodes with R-tree Extended Node Bounding Cost Estimation based Update Strategy  Evaluation  Conclusion 2

 Each day a huge amounts of information is put on the Internet in the form of digital data.  Traditional data management tools have been insufficient for this new demands. 3

 Systems supporting cloud computing dynamically allocate computational resources according to users’ requests.  Building more efficient index structure is a pressing demand. 4

 we present a scalable and flexible multi- dimensional index structure based on the combination of R-Tree and KD-tree. Propose an efficient and scalable multi- dimensional index structure. Propose a cost estimation-based index update strategy. perform a series of experiments on large scale of machine nodes with large volume of data. 5

6

7

 Query Processing Locating relative slave nodes for query Processing query on each slave node and fetch results  Index Maintenance Locating appropriate slave nodes for record insertion Locating relative slave nodes for data deletion Inserting records into individual slave node Deleting records from individual slave node 8

 R-Tree is a popular multi-dimensional index, which is usually used in spatial and multi-dimensional applications.  KD-Tree is a binary tree in which each interior node has an associated attribute a and a value V. 9

 Query Processing Each node utilizes the local Kd-tree index to get records on that node. The procedures are describe as algorithm 4 and 5: 10

 Index Maintenance Each node is a potential node for query processing, we need to perform local deletion on every slave node. 11

 Definition 1. A node cube is a sequence of value intervals, and each interval represents the value range of one indexed attribute on this node.  Example 1 : If we construct a two-dimension index on attribute age and salary of a table, we can make a node cube of {[30, 40], [100,200]} meaning that records on this node have age attribute between 30 and 40 and salary attribute between 100 and

. Definition 2. EMINC index structure consists of a R-tree in master nodes and one KD-tree on each slave node 13

 Query Processing Definition 3. A query cube is a sequence of intervals, and each interval represents the value range of one attribute in this query. Definition 4. Intersection of two cubes means that for each attribute the two corresponding intervals must have overlap. 14

 Query Processing 15

 Index Maintenance In order for the node cube information to stay effective, we have to update the cube on master nodes if the cube is out-of-date due to data insertion or deletion on slave nodes. 16

 Index Maintenance 17

 Index Maintenance 18

 With EMINC, we use bounding technique to filter unnecessary queries.  Suppose on some node A, we have 7 data records:[0, 0], [12, 12], [15, 15], [13, 21], [17, 30],[23, 5], [30, 6] Now we cut both axis X and Y to three equal pieces and get nine small regions. From the distribution we get are: {[0, 0], [0, 0]}, {[12, 15], [12, 15]},{[13, 17], [21, 30]}, {[23, 30], [5, 6]} 19

20

 Cube Methods Random cuttingEqual cuttingClustering-based cutting 21

 But the cost of updating is also nontrivial since even the fastest cutting method is in O(n) time complexity where n is the number of data records on this slave node.  So the basic idea is: benefit > cost.  We propose a cost-estimation-based approach to handle the cube update problem. 22

 To simplify the discussion, we make the following assumption : The amount of queries forwarded to each slave node is proportional to the total volume of all the node cubes of the slave node.  δ v refers to the decrement of volume after the update  nq to denote the number of queries  δ T is the time span from now to when next update happens. 23

 mt is used to denote the time needed to do a update of cube.  qt to denote the average time needed to process a query on this node. 24

 We use an iterative two phase approach for the update strategy. After each update, we first calculate a minimal time span before the next update could happen - the δ T we introduced. 25

26

 6 machines 1 as master node 5 slave nodes simulating 100~1000 nodes  Each machine had a 2.33GHz Intel Core2 Quad CPU, 4GB of main memory, and a 320G disk.  Machines ran Ubuntu 9.04 Server OS. 27

 Point Query 28

 Range Query 29

 In this paper we presented EMINC and EEMINC for building efficient multi- dimensional index in Cloud platform.  We developed the node bounding technique to reduce query processing cost on the Cloud platform.  We proposed a cost estimation-based approach for index update. And we proved the efficacy of our approach with vast experiment. 30

 Q&A 31