C-Store: Integrating Compression and Execution Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 20, 2009.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
CS 540 Database Management Systems
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
C-Store: Class Overview Spring, 2009 Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009.
C-Store: Updates Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 15, 2009.
Bitmap Index Buddhika Madduma 22/03/2010 Web and Document Databases - ACS-7102.
The SBC-Tree: An Index for Run- Length Compressed Sequences Mohamed El-tabakh 1, Wing-Kia Hon 2 Rahul Shah 3, Walid Aref 1, Jeffrey Vitter 1 1 Department.
1 Query Optimization In Compressed Database Systems Zhiyuan Chen and Johannes Gehrke Cornell University Flip Korn AT&T Labs.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
03/22/2004CSCI 315 Operating Systems Design1 Virtual Memory Notice: The slides for this lecture have been largely based on those accompanying the textbook.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Introduction to Column-Oriented Databases Seminar: Columnar Databases, Nov 2012, Univ. Helsinki.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Abadi, Adam Marcus, Samuel R. Madden, Kate Hollenbach VLDB, 2007 Oct 15, 2014.
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
C-Store: A Column-oriented DBMS Speaker: Zhu Xinjie Supervisor: Ben Kao.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CS 345: Topics in Data Warehousing Thursday, October 21, 2004.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
C-Store: Column-Oriented Data Warehousing Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
© Stavros Harizopoulos 2006 Performance Tradeoffs in Read- Optimized Databases: from a Data Layout Perspective Stavros Harizopoulos MIT CSAIL Modified.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Lecture 18: Dynamic Reconfiguration II November 12, 2004 ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II.
Column-Stores vs. Row-Stores How Different are they Really? Daniel J. Abadi, Samuel Madden, and Nabil Hachem, SIGMOD 2008 Presented By, Paresh Modak( )
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun. 5, 2009.
Column Oriented Database Vs Row Oriented Databases By Rakesh Venkat.
C-Store: Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 27, 2009.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
C-Store: Data Model and Data Organization Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
EECS 262a Advanced Topics in Computer Systems Lecture 16 C-Store / DB Cracking October 22 nd, 2012 John Kubiatowicz and Anthony D. Joseph Electrical Engineering.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
EECS 262a Advanced Topics in Computer Systems Lecture 16 C-Store / DB Cracking October 28 th, 2013 John Kubiatowicz and Anthony D. Joseph Electrical Engineering.
CS4432: Database Systems II Query Processing- Part 2.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 540 Database Management Systems
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
Column Oriented Database By: Deepak Sood Garima Chhikara Neha Rani Vijayita Gumber.
DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi ( ) Mahima Malik ( ) Shrey Gupta ( ) Vedanshi Kataria ( )
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Column-Stores vs. Row-Stores How Different are they Really? Daniel J. Abadi, Samuel Madden, and Nabil Hachem, SIGMOD 2008: Talk by Karthik Ramachandra,
Module 11: File Structure
15.1 – Introduction to physical-Query-plan operators
Database Performance Tuning and Query Optimization
Evaluation of Relational Operations
Chapter 15 QUERY EXECUTION.
Chapter 13: Data Storage Structures
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 11 Database Performance Tuning and Query Optimization
John Kubiatowicz Electrical Engineering and Computer Sciences
Chapter 13: Data Storage Structures
Chapter 13: Data Storage Structures
Presentation transcript:

C-Store: Integrating Compression and Execution Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 20, 2009

High Compressibility in Column Store Each attribute is stored in a separate column. A Column Store can not only use traditional compression techniques  Dictionary encoding, Huffman Encoding, etc But also can use column-oriented techniques  Run-Length Encoding

Benefits of Compression in DBMS Reduces the Size of Data Improve I/O Performance  Reducing seek times: data are stored nearer to each other.  Reducing transfer times:there is less data  Increasing buffer hit rate:buffer can hold larger fraction of data

How to Query a Compressed Column? De-compress the data Query the compressed column directly?  Run-Length Encoding

A Simple Example In column C1, the value “42” appears 1000 times consecutively. Assume C1 is compressed by Run-Length Encoding Query: SUM(C1)  ==42 * 1000

History of Compression in DBMS 80s:  Focus on compression ratio 90s:  CPU cost of compressing/decompressing should be less than the savings of reducing the size of data. Now:CPU speed increases much faster than memory speed and disk speed.  Light-weight Compression is good

Reducing CPU cost on compressed data (Graefe and Shapiro, 1991) Lazy Decompression  Data is decompressed only if needed to be operated on. Query the compressed data directly  Exact-match Comparison, Natural Join If the constant portion of the predicate is compressed in the same way as the data

New Work in C-Store Simultaneously apply an operation on multiple values in a single column. Introduces a novel architecture for passing compressed data between query operators.  Minimizes operator code complexity  While maximizes chances for direct operation on compressed data.

Review:Basic Concepts of C-Store A logical table is physically represented as a set of projections. Each projection consists of a set of columns  Columns are stored separately, along with a common sort order defined by SORT KEY. Each column appears in at least one projection. A column can have different sort orders if it is stored in multiple projections.

An example of C-Store Projection LINEITEM (shipdate, quantity, retflag, suppkey | shipdate, quantity, retflag)  First sorted by shipdate  Second sorted by quantity  Third sorted by retflag Sorting increases locality of data.  Favors Compression Techniques such as Run-Length Encoding

C-Store Operators vs. Relational Operators Selection  Produce bitmaps that can be efficiently combined. Mask  Materialize a set of values from a column and a bitmap. Permute  Reorder a column using a join index. Projection  Is free to project a column.  Two columns in the same order can be concatenated for free. Join  Produces positions rather than values.

Join over Two Columns: An Example

Compressed Query Execution: Two Classes for Each New Compression Technique Compression Block  Encapsulates an intermediate representation for compressed data. DataSource operator  Reads in compressed pages from disk and converts them into compression blocks.

A Compression Block contains a buffer of the column data in compressed format Provides an API that allows the buffer to be accessed in several ways.

Accessing Properties of Compression Block isOneValue()  Returns whether or not the block contains just one value (and many positions for that value). isValueSorted()  Returns whether or not the block’s values are sorted. isPosContig()  Returns whether or not the block a consecutive subeset of a column.

Properties of Compression Block: for Various Encoding Schemes.

Iterator Access: where decompression cannot be avoided. getNext()  Transiently decompresses the next value in the compression block  Returns that value along with the position in the uncompressed column. asArray()  Decompresses the entire compression block  And returns a pointer to an array of data in the uncompressed column type.

Block Information Methods (1): Getting Data without Decompression For Run-Length Encoding  A compression block consists of a single RLE triple of the form (value, start_pos, run_length)  getSize(): Returns run_length;  getStartValue(): Returns value;  getEndPosition(): Returns (start_pos + run_length -1);

Block Information Methods (2): Getting Data without Decompression For bitmaps  A compression block is a consecutive subset of a bitmap for a single value.  getSize() : Returns the number of on bits in the block (i.e., a bit string).  getStartValue() : Returns the value with which the bit string is associated.  getEndPosition() : Returns the position of the last on bit in the bit string.

Compression-Aware Optimization Natural Join  An input column is compressed by Run-Length Encoding,  The other input column is uncompressed  Do the join directly Reduce the number of operations by a factor of k, where k is the run-length of the RLE triple. Count

Summary: Integrating Compression and Execution Operate directly on compressed data whenever possible  Using compression blocks as an intermediate representation of data. Degenerate to a lazy decompression scheme when decompression cannot be avoided  Iterating through values in a compression block. Reduce query executor complexity  By abstracting general properties of compression techniques.

References 1. Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C-Store: A Column Oriented DBMS, VLDB, ( 2. Daniel J. Abadi, Samuel R. Madden, and Miguel C. Ferreira. Integrating Compression and Execution in Column-Oriented Database Systems.In SIGMOD, June, 2006, Chicago, USA pdf 06.pdf