Compressing Relations And Indexes

Slides:



Advertisements
Similar presentations
When Is Nearest Neighbors Indexable? Uri Shaft (Oracle Corp.) Raghu Ramakrishnan (UW-Madison)
Advertisements

OLAP over Uncertain and Imprecise Data
Choosing an Order for Joins
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
STHoles: A Multidimensional Workload-Aware Histogram Nicolas Bruno* Columbia University Luis Gravano* Columbia University Surajit Chaudhuri Microsoft Research.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting “There it was, hidden in alphabetical order.” Rita Holt R&G Chapter 13.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 11 External Sorting.
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
1 SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices Gangyi Zhu, Yi Wang, Gagan Agrawal The Ohio State University.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Indexing for Multidimensional Data An Introduction.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
Looking Inside the Engine: Indexing and Sorting Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 3, 2016 Some slide.
CS4432: Database Systems II Query Processing- Part 3 1.
CSCI 4333 Database Design and Implementation – Exercise (5) Xiang Lian The University of Texas – Pan American Edinburg, TX
CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Multimedia and Time-Series Data When Is “ Nearest Neighbor ” Meaningful? Group member: Terry Chan, Edward Chu, Dominic Leung, David Mak, Henry Yeung, Jason.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
CS4432: Database Systems II Query Processing- Part 1 1.
Dense-Region Based Compact Data Cube
CS522 Advanced database Systems
Spatial Indexing I Point Access Methods.
Toward Advocacy-Free Evaluation of Packet Classification Algorithms
Copyright © 2003 by Kyu-Young Whang
Sameh Shohdy, Yu Su, and Gagan Agrawal
Database Management Systems (CS 564)
File Processing : Query Processing
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Selected Topics: External Sorting, Join Algorithms, …
Multidimensional Indexes
15.6 Index Based Algorithms
CSCI 4333 Database Design and Implementation – Exercise (5)
Differential Privacy (2)
Evaluating Logarithms
Overview of Query Evaluation
Data Transformations targeted at minimizing experimental variance
External Sorting.
Database Systems (資料庫系統)
Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
CSE 190D Database System Implementation
Presentation transcript:

Compressing Relations And Indexes Jonathan Goldstein Raghu Ramakrishnan Uri Shaft Department of Compter Sciences, University of Wisconsin-Madison June 18, 1997

Agenda Introduction Compressing A Relation Compression Applied to Rectangle Base Indexes Performance Evaluation Questions and Remarks

Introduction Page level Compression Performance Study Application to B-trees and R-trees Multidimensional bulk loading algorithm

Introduction

Introduction

Compressing A relation Frames Of Reference Non numeric attributes File level compression

Frames of Reference

Point approximation in lossy compression

Compressing an indexing structure Compressing a B-tree Compressing a rectangle based indexing structure Compression oriented Bulk Loading

Rectangle Based indexing qualities

Changing the frame of reference

Bulk-Loading Algorithm Input. A set of points in some n-dimentional space. Output. A partition of the inut into subsets. Requirements. The partition shuold group points that are close to each other in the same group as much as possiblg

GB-Pack compression oriented bulk loading

GB-Pack compression oriented bulk loading Qualities: trading off some tree quality for increased compression. number of entries per page is data-dependent. cutting a dimension in a value boundary in the data.

GB-Pack compression oriented bulk loading

GB-Pack compression oriented bulk loading

GB-Pack compression oriented bulk loading

Performance Evaluation Relational Compression Experiments. CPU vs. I/O Costs. Comparison With Techniques in commercial systems. Importance of Tuple-Level Decompression. R-tree Compression Experiments.

Synthetic Data Sets Size: The number of tuples in the relation. Dimensionality: The number of attributes of the relations. Range: The range of values for the attributes. Distribution :uniform(worst case) / exponential. Partition Strategy. Page size.

Sales Data Set Sales data set. Compression Achieved versus dimensionality

CPU vs. I/O Costs

R-tree Compression Experiments Testing the quality of R-trees on Sales Data Set.

Questions And Remarks