Making Fast Databases FASTER @andy_pavlo 1.

Slides:



Advertisements
Similar presentations
The HV-tree: a Memory Hierarchy Aware Version Index Rui Zhang University of Melbourne Martin Stradling University of Melbourne.
Advertisements

Andy Pavlo April 13, 2015April 13, 2015April 13, 2015 NewS QL.
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Exploiting Distributed Version Concurrency in a Transactional Memory Cluster Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza University of Toronto,
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
@andy_pavlo On Predictive Modeling for D istributed D atabases VLDB - August 28 th, 2012.
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
1 of 14 1 Scheduling and Optimization of Fault- Tolerant Embedded Systems Viacheslav Izosimov Embedded Systems Lab (ESLAB) Linköping University, Sweden.
FLANN Fast Library for Approximate Nearest Neighbors
Performance and Scalability. Performance and Scalability Challenges Optimizing PerformanceScaling UpScaling Out.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Memory/Storage Architecture Lab Computer Architecture Performance.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
MonetDB/X100 hyper-pipelining query execution Peter Boncz, Marcin Zukowski, Niels Nes.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Managing Real-Time Transactions in Mobile Ad-Hoc Network Databases Le Gruenwald The University of Oklahoma School of Computer Science Norman, Oklahoma,
H-Store: A Specialized Architecture for High-throughput OLTP Applications Evan Jones (MIT) Andrew Pavlo (Brown) 13 th Intl. Workshop on High Performance.
@andy_pavlo FAS TER Making Fast Databases. Fast Cheap +
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Database System Architecture Prof. Yin-Fu Huang CSIE, NYUST Chapter 2.
Tivoli Software © 2010 IBM Corporation 1 Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen,
Online Data partitioning in distributed database systems
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Your Data Any Place, Any Time Performance and Scalability.
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
@andy_pavlo Automatic Database Partitioning in Parallel OLTP Systems SIGMOD May 22 nd, 2012.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Jihui Yang CS525 Advanced Distributed System March 1, 2016.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
Towards a Non-2PC Transaction Management in Distributed Database Systems Qian Lin, Pengfei Chang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Zhengkui Wang.
A Look Under The Hood The fastest, in-memory analytic database in the world Dan Andrei STEFAN
CSCI5570 Large Scale Data Processing Systems
Lecture 2: Performance Evaluation
CSCI5570 Large Scale Data Processing Systems
Applying Control Theory to Stream Processing Systems
Dremel.
Parallel Density-based Hybrid Clustering
Introduction to NewSQL
Joe Chang yahoo . com qdpma.com
Memory Management for Scalable Web Data Servers
Project Project mid-term report due on 25th October at midnight Format
Database Performance Tuning and Query Optimization
Building an Elastic Main-Memory Database: E-Store
Adda Quinn 1974 Nancy Wheeler Jenkins 1978.
April 30th – Scheduling / parallel
Predictive Performance
NoSQL Databases An Overview
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Anti-Caching in Main Memory Database Systems
Hybrid Indexes Reducing the Storage Overhead of
Selected Topics: External Sorting, Join Algorithms, …
Caching Principles and Paging Performance
Parallel Computing Demand for High Performance
Parallel DBMS Chapter 22, Part A
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
Caching Principles and Paging Performance
Parallel Computing Demand for High Performance
Parallel System for BLAST
Lecture 20: Intro to Transactions & Logging II
Chapter 11 Database Performance Tuning and Query Optimization
Parallel DBMS DBMS Textbook Chapter 22
Measuring Transaction Performance MongoDB Meets TPC-C
Presentation transcript:

Making Fast Databases FASTER @andy_pavlo 1

Fast Cheap + Although memory is getting cheap, load up single machine with a lot of RAM is still a bade idea, will take a long time to refresh cold cache. Therefore, although a single-node main memory database might be really fast, need to support multiple machines. This is hard. If a system is already fast, can we make it faster without giving up ACID?

Main Memory • Parallel • Shared-Nothing Transaction Processing

Stored Procedure Execution Transaction Result Procedure Name Input Parameters Client Application

TPC-C NewOrder

Optimization #1: Partition database to reduce the number of distributed txns. Skew-Aware Automatic Database Partitioning in Shared-Nothing, Main Memory OLTP Systems In Submission

ITEM CUSTOMER CUSTOMER ORDERS CUSTOMER CUSTOMER CUSTOMER ORDERS ORDERS c_id c_w_id c_last … 1001 5 RZA - 1002 3 GZA 1003 12 Raekwon 1004 Deck 1005 6 Killah 1006 7 ODB i_id i_name i_price … 603514 XXX 23.99 - 267923 19.99 475386 14.99 578945 9.98 476348 103.49 784285 69.99 c_id c_w_id c_last … 1001 5 RZA - 1002 3 GZA 1003 12 Raekwon 1004 Deck 1005 6 Killah 1006 7 ODB o_id o_c_id o_w_id … 78703 1004 5 - 78704 1002 3 78705 1006 7 78706 1005 6 78707 78708 1003 12 CUSTOMER CUSTOMER CUSTOMER ORDERS ORDERS ORDERS ITEM ITEM ITEM

Large-Neighorhood Search Algorithm SELECT * FROM WAREHOUSE WHERE W_ID = 10; INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345); ⋮ SELECT * FROM DISTRICT D_W_ID = 10 AND D_ID =9; SELECT * FROM DISTRICT WHERE D_W_ID = 10 AND D_ID =9; INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID,…) VALUES (10, 9, 12345,…); NewOrder DDL CUSTOMER ITEM ORDERS DTxn Estimator DDL Schema Skew Estimator ------- ------- ------- Workload Large-Neighorhood Search Algorithm CUSTOMER ORDERS ITEM CUSTOMER ORDERS ITEM CUSTOMER ORDERS ITEM CUSTOMER ORDERS ITEM …

TATP TPC-C

? Client Application ? ?

Optimization #2: Predict what txns will do before they execute. On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems in PVLDB, vol 5. issue 2, October 2011

Client Application April 5, 2019April 5, 2019 Mention that we’re using machine learning techniques.

Need to say what happens when you’re wrong.

+57% +126% +117% TATP TPC-C AuctionMark Houdini Assume Single-Partitioned (txn/s) TATP TPC-C AuctionMark +57% +126% +117%

Conclusion: Future Work: Achieving fast performance is more than just using only RAM. Future Work: Reduce distributed txn overhead through creative scheduling.

h-store hstore.cs.brown.edu github.com/apavlo/h-store

Help is Available +1- 617-258-6643 Graduate Student Abuse Hotline Available 24/7 Collect Calls Accepted