C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.

Slides:



Advertisements
Similar presentations
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
Advertisements

Query Processing and Optimizing on SSDs Flash Group Qingling Cao
Parallel Databases Michael French, Spencer Steele, Jill Rochelle When Parallel Lines Meet by Ken Rudin (BYTE, May 98)
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Last Time –Main memory indexing (T trees) and a real system. –Optimize for CPU, space, and logging. But things have changed drastically! Hardware trend:
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
VIRTUAL MEMORY. Virtual memory technique is used to extents the size of physical memory When a program does not completely fit into the main memory, it.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Introduction to Column-Oriented Databases Seminar: Columnar Databases, Nov 2012, Univ. Helsinki.
Flash research report Da Zhou Outline Query Processing Techniques for Solid St ate Drives (Research Paper) Join Processing for Flash SSDs: Rememb.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
1 Physical Data Organization and Indexing Lecture 14.
© Stavros Harizopoulos 2006 Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos MIT CSAIL joint work with: Velen Liang, Daniel Abadi,
CS 153 Design of Operating Systems Spring 2015 Final Review.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts.
© Stavros Harizopoulos 2006 Performance Tradeoffs in Read- Optimized Databases: from a Data Layout Perspective Stavros Harizopoulos MIT CSAIL Modified.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun. 5, 2009.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
C-Store: Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 27, 2009.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
C-Store: Integrating Compression and Execution Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 20, 2009.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi ( ) Mahima Malik ( ) Shrey Gupta ( ) Vedanshi Kataria ( )
CS 704 Advanced Computer Architecture
Database Management System
Database Management Systems (CS 564)
External Sorting Chapter 13
Database Management Systems (CS 564)
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Lecture 9: Data Storage and IO Models
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
External Sorting Chapter 13
Overview Continuation from Monday (File system implementation)
Selected Topics: External Sorting, Join Algorithms, …
Lecture 2- Query Processing (continued)
Chapter 12 Query Processing (1)
External Sorting.
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
External Sorting Chapter 13
Presentation transcript:

C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009

Solid State Drives (SSDs) vs. Hard Disk Drives (HDDs) HDDs (traditional magnetic hard drives) perform sequential reads much faster than random reads.  The traditional wisdom is to avoid random I/O as much as possible. SSDs perform random reads more than 100x faster than HDDs, and offer comparable sequential read and write performance. SSDs’ random write performance is much worse than random read performance.

Characteristics of HDD and SSD (NAND Flash)

How to leverage the fast random reads of SSDs? Avoid reading unnecessary attributes during selections and projections.  The idea of Column store.  Reduce I/O requirements during join by minimizing passes over related tables.  Minimize the I/O needed to fetch attribute values by late materialization.

Page Layouts: NSM vs PAX NSM: traditional row store. PAX: A hybrid approach of row store and column store. each page is divided into n minipages. Each minipage stores the values of a column contiguously.

FlashScan Operator It is a scan operator that leverages the PAX layout to improve selections and projections on flash SSDs. Basic ideas:  Once a page is brought into main memory, read only the minipages of the attributes that are in need. The goal is to reduce memory bandwidth.  The cache line is 128 Bytes long, suggesting ideally a minipage should take the same size.

An Example Running of FlashScan Consider a scan that simply project the 1 st and 3 rd column of the relation in Figure 3.  For each page, FlashScan initially reads the minipage of the 1 st column  And then “seeks” to the start of the 3 rd minipage and read it.  Then it “seeks” again to the first minipage of the next page.  This procedure continues over the entire relation, resulting in a random access pattern.

Processing Random Reads in a Batch Mode In general, every “seek” results in a random read. FlashScan coalesces the reads and performs one random access of SSD for each set of contiguous minipages.

Implementing FlashScan in Postgres Divide every page into dense-packed minipages. Modify the buffer manager and bulk loader to work with the PAX page layout.  A page in the buffer pool may be partially full containing only the minipages transferred by FlashScan. Make FlashScan output tuples in row-format.  i.e., tuple reconstruction.

Optimization for Selection Predicates The technique  Read only the minipages that satisfying the selection conditions. This technique is beneficial  for highly selective conditions  and for selection conditions that are applied to sorted or partially sorted attributes.

FlashScan and Column Stores FlashScan needs to “seek” between minipages. Column stores need to “seek” between columns.  Column stores on HDDs read a large portion of a single column at a time to amortize the “seek” overhead. If a column store is built on SSDs, it should have similar behavior as FlashScan.  This assertion needs experimental validation. And a PAX-based system can be easily integrated with a row-store DBMS.  So we have no need to build a column store on SSDs?

Overview of FlashJoin FlashJoin is a multi-way equi-join algorithm. It is implemented as a pipeline stylized binary joins. Each binary join in the pipeline consists of two separate operators: a join kernel a fetch kernel.

An Example of FlashJoin Using Late Materialization

Join Kernel The join kernel leverages FlashScan to fetch only the join attributes needed from base relations.  i.e., FlashJoin uses late materialization.  Hence the join kernel needs less memory, which may lead to less passes for computing the join. The join kernel computes the join and output a join index.  For example, Join 2 in previous slide produces a join index containing three RIDs (id1, id2, id3) pointing to rows of R1, R2, and R3.

Fetch Kernel The fetch kernel uses the join index to do tuple reconstruction.  i.e., retrieve values of projected attributes for tuples in the join result. A naïve strategy is to do tuple reconstruction in a tuple-at-a-time fashion.  If several tuples belonging to the same page, that page may be read several times.  This is not a problem if we have enough memory.

An Optimization in Fetch Kernel Makes multiple passes over the join index to fetch attributes in row order from one relation at a time.  In each pass, the join index is sorted based on the RIDs of the current relation R to be scanned.  Then, it retrieves the needed attributes from that relation R for each tuple and augments the join index with those attributes.

Why Does the Fetching in Row Order? Sorting ensures that once a minipage from a relation has been accessed, it will not need to accessed again.  Thus placing minimal demands on the buffer pool. However sorting does not ensure sequential access to the underlying relation.  Because pages corresponding to the sorted RIDs can be far apart.  Hence this optimization is better performed on SSDs than on HDDs.

Conclusion SSDs consititute a significant shift in hardware characteristics.  comparable to large CPU caches and many-core processors. SSDs can improve performance for read- most applications. A column-based page data layout is shown to be a natural choice for speeding up selections and projections on SSDs.

References Dimitris Tsirogiannis, Stavros Harizopoulos, Mehul A. Shah, Janet L. Wiener, and Goetz Graefe. Query Processing Techniques for Solid State Drives. In SIGMOD, 2009.Query Processing Techniques for Solid State Drives