Ingres/VectorWise Doug Inkster – Ingres Development.

Slides:



Advertisements
Similar presentations
1 Copyright © 2012 Oracle and/or its affiliates. All rights reserved. Convergence of HPC, Databases, and Analytics Tirthankar Lahiri Senior Director, Oracle.
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
CS4432: Database Systems II Buffer Manager 1. 2 Covered in week 1.
Ingres/Vectorwise Implementation Details XXV Ingres Benutzerkonferenz 2012 Confidential © 2011 Actian Corporation Doug Inkster 1 of 9.
Copyright © 200\8 Quest Software High Performance PL/SQL Guy Harrison Chief Architect, Database Solutions.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Last Time –Main memory indexing (T trees) and a real system. –Optimize for CPU, space, and logging. But things have changed drastically! Hardware trend:
Chapter 1 and 2 Computer System and Operating System Overview
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
Dutch-Belgium DataBase Day University of Antwerp, MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Objectives of the Lecture :
Chapter 3 Memory Management: Virtual Memory
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 Physical Data Organization and Indexing Lecture 14.
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Breaking the Memory Wall in MonetDB
MapReduce With a SQL-MapReduce focus by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
VectorWise The world’s fastest database GIUA, 13 September 2011.
CSCE Database Systems Chapter 15: Query Execution 1.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
MonetDB/X100 hyper-pipelining query execution Peter Boncz, Marcin Zukowski, Niels Nes.
Efficient and Flexible Information Retrieval Using MonetDB/X100 Sándor Héman CWI, Amsterdam Marcin Zukowski, Arjen de Vries, Peter Boncz January 08, 2007.
Ingres/VectorWise Doug Inkster – Ingres Development.
Improving Efficiency of I/O Bound Systems More Memory, Better Caching Newer and Faster Disk Drives Set Object Access (SETOBJACC) Reorganize (RGZPFM) w/
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
1 C-Store: A Column-oriented DBMS By New England Database Group.
Column Oriented Database Vs Row Oriented Databases By Rakesh Venkat.
Set Containment Joins: The Good, The Bad and The Ugly Karthikeyan Ramasamy Jointly With Jignesh Patel, Jeffrey F. Naughton and Raghav Kaushik.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS4432: Database Systems II Query Processing- Part 2.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Chapter 5 Index and Clustering
Bigtable: A Distributed Storage System for Structured Data
Chapter 7 Memory Management Eighth Edition William Stallings Operating Systems: Internals and Design Principles.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Troubleshooting Dennis Shasha and Philippe Bonnet, 2013.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
CDA 3101 Spring 2016 Introduction to Computer Organization Physical Memory, Virtual Memory and Cache 22, 29 March 2016.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Module 11: File Structure
Understanding Operating Systems Seventh Edition
Parallel Databases.
Database Management Systems (CS 564)
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Lecture 10: Buffer Manager and File Organization
Chapter 15 QUERY EXECUTION.
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
Chapter 12 Query Processing (1)
Chapter 11 Database Performance Tuning and Query Optimization
Query Processing.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Presentation transcript:

Ingres/VectorWise Doug Inkster – Ingres Development

Abstract  Ingres Corp. recently announced a cooperative project with the database research firm VectorWise of Amsterdam. This session discusses the nature of the relationship between Ingres and VectorWise and its impact on Ingres users

Overview  What is VectorWise  Significance of project (to existing users)  Column store v.s. row store  VectorWise innovations  Ingres/VectorWise interface details

VectorWise  Small Dutch startup spun from CWI  Currently 6-7 employees  Exciting development project based on Ph. D. research of Marcin Zukowski under guidance of Peter Boncz  Ingres provided seed money and currently has exclusive rights to their technology

New Ingres Storage Type  VectorWise technology will take the form of new Ingres storage type  Column store – but turbo-charged  Users will just define tables as usual, but with new storage type  Extremely fast

Performance in Traditional RDBMS  TPC H query 1: 6 million rows  MySQL: 26.2 seconds  DBMS x: 28.1 seconds  Hand coded C program: 0.2 seconds  Initiated thought about why traditional RDBMS is so slow

Row Stores – Storage Model  Row store : data stored on disk and processed by query engine in the form of rows –Ingres and all other commercial RDBMS’ follow same model –Data stored on disk as rows packed into blocks/pages –Extracted from pages and passed between query plan operations (joins, sorts, etc.) as rows  Techniques improved over time, but same basic model as research engines of 1970’s (System R, Berkeley Ingres)

Row Stores – Storage Model  Select a, b from c: –Even if 200 columns in c, reads whole row image for all qualified rows –Inefficient use of disk (excessive I/O) –Inefficient use of memory (large row-sized buffers)

Row Stores – Execution Model  Select a*b from c where d+e >= 250: –Execution model turns column expressions into pseudo-code (ADF in Ingres) –Pseudo-code evaluation almost always by interpretation – one row at a time –For each instruction, operand addresses are built (using known buffers) – all row induced overhead –Big switch on “opcode” (integer addition, float compare, type coercions, etc.) –Very poor locality (code and data), very inefficient –No benefit from modern hardware features (cache, instruction pipelines, etc.)

Row Stores - Performance  Poor disk bandwidth  High instructions per tuple  Poor locality (data and instructions), poor exploitation of newer hardware – high cycles per instruction  Extremely high cycles per tuple!!  Designed for OLTP, not large scale analytical processing

Column Stores – Storage Model  Derived from research in 1990’s  One of the earliest was MonetDB – Peter Boncz’s thesis  Data stored in columns (all values from a column in contiguous storage)  Ordinal position in column store dictates row  Only columns actually requested are accessed (compare to “select a, b from c” example)  Far more efficient disk usage

Column Stores – Execution Model  Some column stores (Sybase IQ, Vertica) just read columns from disk and re-compose rows in memory  Execution model same as row stores  Only deal with I/O problem

MonetDB – Improved Performance  Column store improved I/O performance  New execution model improved CPU performance  Column wise (not row wise) computation  Exploitation of newer hardware, compiler features

MonetDB – Improved Performance  CPU Efficiency depends on “nice” code –out-of-order execution –few dependencies (control,data) –compiler support  Compilers love simple loops over arrays –loop-pipelining –automatic SIMD

MonetDB – Expression Execution SELECTid, name, (age-30)*50 as bonus FROMpeople WHEREage > 30 /* Returns vector of oid’s satisfying comparison ** and count of entries in vector. */ int select_gt_float(oid* res, float* column, float val, int n) { for(int j=0,i=0; i<n; i++) if (column[i] >val) res[j++] = i; return j; } Compiles into loop that performs many (10, 100, 1000) comparisons in parallel

MonetDB - Problem  Operations required on whole columns at a time  Lots of expensive intermediate result materialization  Memory requirements  Doesn’t scale into very large databases  Still way faster than anything before –Q1: MySQL – 26.2 seconds –MonetDB seconds

MonetDB/X100 - Innovations  Fix MonetDB scaling problem –Columns broken into “chunks” or vectors (sized to fit cache) –Expression operators execute on vectors  Most primitives take just 0.5 (!) to 10 cycles per tuple – times faster than tuple-at-a-time  Other performance improvements –Tuned to disk/memory/cache –Lightweight compression to maximize throughput –Cooperative scans (future)  Q1: MySQL – 26.2 seconds –MonetDB – 3.7 seconds –X100 – 0.6 seconds!

MonetDB/X100 – Hardware Trends  CPU –Increased cache, memory (64-bit) –Instruction pipelining, multiple simultaneous instructions, SIMD (single instruction, multiple data)  Disk –Increased disk capacity –Increased disk bandwidth –Increased random access speeds – but not as much as bandwidth –Sequential access increasing in importance

MonetDB/X100 - Innovations  Reduced interpretation overhead –100x less Function Calls  Good CPU cache use –High locality in the primitives –Cache-conscious data placement  No Tuple Navigation –Primitives only see arrays  Vectorization allows algorithmic optimization  CPU and compiler-friendly function bodies –Multiple work units, loop-pipelining, SIMD…

Ingres/VectorWise Integration  Ingres offers: –Infrastructure –User interfaces, SQL compilation, utilities, etc. –Existing user community  VectorWise offers: –Execution engine (QEF/DMF/ADF equivalent) –Its own relational algebraic language

Ingres/VectorWise Integration  SQL query parsed, optimized in Ingres  Ingres builds query plan from OPF-generated QEP –Query plan contains cross-compiled VectorWise query syntax  QEF passes query to VectorWise engine (and gets out of the way!)  VectorWise passes result set back to Ingres to be returned to caller (array of result rows – mapped to Ingres fetch buffers, many rows at a time)

Ingres/VectorWise Integration  Currently: –Create table including unique/referential constraint definitions (with structure = …) –Copy table –Drop table –Select –Insert –Update –Delete –Create table … as select … –Insert into … select … –Create index (VectorWise style)

Ingres/VectorWise – Initial Release  Individual queries limited to either VectorWise or Ingres native tables –Probably lifted in the near future  Seamless as possible –Integrated ingstart/ingstop –Same utilities –Same user interfaces –Same recovery processes

Ingres/VectorWise - Conclusions  Tremendously exciting development in Ingres  Industry leading technology  Opens up new applications  Reduces pressure for new hardware