Analysis of Database Workloads on Modern Processors Advisor: Prof. Shan Wang P.h.D student: Dawei Liu Key Laboratory of Data Engineering and Knowledge.

Slides:



Advertisements
Similar presentations
DBMS S O N A M ODERN P ROCESSOR : W HERE D OES T IME G O ? Anatassia Ailamaki David J DeWitt Mark D. Hill David A. Wood Presentation by Monica Eboli.
Advertisements

1 Copyright © 2012 Oracle and/or its affiliates. All rights reserved. Convergence of HPC, Databases, and Analytics Tirthankar Lahiri Senior Director, Oracle.
To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,
1 Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.
DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
Teaching Old Caches New Tricks: Predictor Virtualization Andreas Moshovos Univ. of Toronto Ioana Burcea’s Thesis work Some parts joint with Stephen Somogyi.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Improving Database Performance on Simultaneous Multithreading Processors Jingren Zhou Microsoft Research John Cieslewicz Columbia.
Evaluation of Data Placement Method in Database Run-Time Processing Considering Energy Saving and Application Performance Naho IIMURA† Norifumi NISHIKAWA‡
Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.
Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.
C-AMAT:Concurrent Average Memory Access Time
Introduction to Systems Architecture Kieran Mathieson.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
1 Software Testing and Quality Assurance Lecture 40 – Software Quality Assurance.
DaMoN 2011 Paper Preview Organized by Stavros Harizopoulos and Qiong Luo Athens, Greece Jun 13, 2011.
Dutch-Belgium DataBase Day University of Antwerp, MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Towards Eco-friendly Database Management Systems W. Lang, J. M. Patel (U Wisconsin), CIDR 2009 Shimin Chen Big Data Reading Group.
Performance and Scalability. Performance and Scalability Challenges Optimizing PerformanceScaling UpScaling Out.
+ CS 325: CS Hardware and Software Organization and Architecture Introduction.
Conference title1 A New Methodology for Studying Realistic Processors in Computer Science Degrees Crispín Gómez, María E. Gómez y Julio Sahuquillo DISCA.
Application-driven Energy-efficient Architecture Explorations for Big Data Authors: Xiaoyan Gu Rui Hou Ke Zhang Lixin Zhang Weiping Wang (Institute of.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Continuous resource monitoring for self-predicting DBMS Dushyanth Narayanan 1 Eno Thereska 2 Anastassia Ailamaki 2 1 Microsoft Research-Cambridge, 2 Carnegie.
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Buffering Database Operations for Enhanced Instruction Cache Performance Jingren Zhou, Kenneth A. Ross SIGMOD International Conference on Management of.
MonetDB/X100 hyper-pipelining query execution Peter Boncz, Marcin Zukowski, Niels Nes.
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Profiling Memory Subsystem Performance in an Advanced POWER Virtualization Environment The prominent role of the memory hierarchy as one of the major bottlenecks.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Srihari Makineni & Ravi Iyer Communications Technology Lab
Advanced Computer Architecture Cache Memory 1. Characteristics of Memory Systems 2.
Authors: Stavros HP Daniel J. Yale Samuel MIT Michael MIT Supervisor: Dr Benjamin Kao Presenter: For Sigmod.
Performance Analysis of the Compaq ES40--An Overview Paper evaluates Compaq’s ES40 system, based on the Alpha Only concern is performance: no power.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison.
MEMORY SYSTEM CHARACTERIZATION OF COMMERCIAL WORKLOADS Authors: Luiz André Barroso (Google, DEC; worked on Piranha) Kourosh Gharachorloo (Compaq, DEC;
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad*
Your Data Any Place, Any Time Performance and Scalability.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
CS203 – Advanced Computer Architecture Performance Evaluation.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
William Stallings Computer Organization and Architecture 6th Edition
Computer Sciences Department University of Wisconsin-Madison
Chapter 1: Introduction
Scaling the Memory Power Wall with DRAM-Aware Data Management
Memory System Characterization of Commercial Workloads
The Problem Finding a needle in haystack An expert (CPU)
Presented by: Eric Carty-Fickes
Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.
Presentation transcript:

Analysis of Database Workloads on Modern Processors Advisor: Prof. Shan Wang P.h.D student: Dawei Liu Key Laboratory of Data Engineering and Knowledge Engineering MOE School of Information Renmin University of China

Outlines 1. Background 2. Motivation 3. Our research work 4. Future works

Background LAMA Project Goal Advanced issues of Massively Parallel Processing (MPP) databases Architecture and design aspects; Next generation memory oriented DB My Focus rgle Scale Data nagement La Ma Joint research with HP Lab China

Outlines 1. Background 3. Our research 4. Future works 2. Motivation

Motivation Continued evolution of hardware Processor

Motivation(cont.) Memory Larger and Larger Flash Memory

Cont. Traditional research Dedicate to I/O optimization Fail to utilize processor resources efficiently

Cont. Modern processors (Itanium II) multi-level memory hierarchies; superscalar out-of-order execution; multi-threading; multi-cores; Create opportunity for database performance improve.

Cont. Object Accurately characterizing workload behavior on modern processor Find out the bottleneck; Benefit Identify a set of characteristics; performance optimization Detailed issues ?

My P.h. D Track (1) Accurately characterize the database workloads on modern processors; (2) Investigating the MMDB workloads on modern processor; (3) Developing a specialized benchmark for MMDB

(1) Processor Issue Previous research [ * ] Conlusion DBMSs achieve low IPC (instructions-per-cycle) Processors are inefficiently used Platform Intel Pentium II / Pentium Pro * A. Ailamaki, D. J. DeWitt, M. D. Hill, D. A. Wood. DBMSs on a Modern Processor: Where Does Time Go? In Proc. VLDB, 1999.

Cont. We are interested in DBMS on today’s processors Itanium II AMD Opteron (tm) Where does 8 years go ?

(2) Main Memory DB Issue Previous research DB: Disk Resident Databases (DRDB) Workload: TPC-C Current problems DB: Main Memory Databases (MMDB) Workload: TPC-H (compute intensive) The “moved up” on the memory hierarchy ; Larger and larger on-chip and off-chip caches ; Steady increased RAM;

(3) MMDB-Oriented Benchmark Performance evaluation OO1-Benchmark OO7-Benchmark obsolete Industrial standards How to benchmark memory database ? TPC Benchmark C TPC Benchmark H OLAPOLTP We found they are not approprite to benchmark MMDB

Outlines 1. Background 2. Motivation 4. Future works 3. Our research

Methodology Analysis framework Experiment study

Pipeline of modern processors

Query Execution Time Breakdown T Q = T C + T M + T B + T R − T OV L [ * ] T C : Useful computation time; T M : Stall time because of memory stalls; T B : Branch misprediction overhead; T R : Resource-related stalls; T OVL : Stall time can be overlapped * A. Ailamaki, D. J. DeWitt, M. D. Hill, D. A. Wood. DBMSs on a Modern Processor: Where Does Time Go? In Proc. VLDB, 1999.

Execution time components on Itanium II platform

Experimental setup Platform-specific hardware Software Experimental methodology

The Hardware Platform HP Integrity rx server Itanium II based server Cache

Cache characteristics

Software and Methodology Calibrator (CWI * ) cache access and miss latency; main memory access latency; number of TLB levels ; each level’s TLB miss latency * Centrum voor Wiskunde en Informatica National research institute for mathematics and computer science in the Netherlands

Cont. Perfsuite (NSCA * ) * National Center for Supercomputing Applications (NCSA) Control hardware counters Measure 60 event types for the results Hardware counters

Stall time components on Itanium II

Results analysis Part one: DRDB Characterization workload on Itanium II OLTP OLAP Part two: MMDB issue Characterization of MMDB TPC-H workload Dawei liu, Shan Wang, Biao Qin, Weiwei Gong: Characterizing DSS Workloads from the Processor Perspective. The International Workshop on Database Management and Application over Network DBMAN 2007: DaweiDawei Liu, Shan Wang, Qiming Chen, Yun Tian, Weiwei Gong “Main Memory Database TPC-H Workload Characterization on Modern Processor,” Renmin University of China., TR-01, 2007,

Memory stall time breakdown TPC-H Workload on a DRDB

Index Influence TPC-H Workload on a DRDB

Branch Instruction Misprediction TPC-H Workload on a DRDB

DRDB vs. MMDB

Storage Architecture Influence

Summary Characterized workload on Itanium II based platform; Characterized MMDB read optimized workload on modern processors; Compare the workload breakdown of DRDB and MMDB; Explored the difference of column-oriented and row-oriented storage models in CPU and cache utilization; Investigated the index influence at low level

Outlines 1. Background 2. Motivation 3. Our research 4. Future works

Future works In-depth analysis of the results Develop new parallel techniques Instruction level parallelism MMDB benchmark issue The results expected to benefit The performance optimization of DBMS; The architecture of next-generation memory-oriented databases.

The End Thanks! Welcome to visit RUC. | Dawei Liu | School of Information | Renmin University of China | | | | | Tel.: +86 (10) | Key Laboratory of Data Engineering and Knowledge Engineering MOE