The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.

Slides:



Advertisements
Similar presentations
R. Dementiev and P. Sanders: Asynchronous Parallel Disk Sorting 1 Asynchronous Parallel Disk Sorting Roman Dementiev and Peter Sanders Max-Planck-Institut.
Advertisements

Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
The Search for Energy- Efficient Building Blocks for the Data Center Laura Keys, Suzanne Rivoire, and John D. Davis Researcher, Microsoft.
1 Distributed Systems Meet Economics: Pricing in Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Towards Energy Efficient Hadoop Wednesday, June 10, 2009 Santa Clara Marriott Yanpei Chen, Laura Keys, Randy Katz RAD Lab, UC Berkeley.
Shimin Chen Big Data Reading Group Presented and modified by Randall Parabicoli.
Shimin Chen Big Data Reading Group.  Energy efficiency of: ◦ Single-machine instance of DBMS ◦ Standard server-grade hardware components ◦ A wide spectrum.
Energy-efficient Cluster Computing with FAWN: Workloads and Implications Vijay Vasudevan, David Andersen, Michael Kaminsky*, Lawrence Tan, Jason Franklin,
Towards Energy Efficient MapReduce Yanpei Chen, Laura Keys, Randy H. Katz University of California, Berkeley LoCal Retreat June 2009.
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
Energy Efficient Prefetching with Buffer Disks for Cluster File Systems 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software.
Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.
1 CS 501 Spring 2005 CS 501: Software Engineering Lecture 22 Performance of Computer Systems.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.
Comparing Coordinated Garbage Collection Algorithms for Arrays of Solid-state Drives Junghee Lee, Youngjae Kim, Sarp Oral, Galen M. Shipman, David A. Dillow,
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
11 Aug 2015Computer introduction1 Storage devices Bits, Bytes, Kilobytes, MB, GB, Terabytes Hardware Moore’s law Disks Internal hard disk TB (
Hardware Overview Iomega Network Storage LENOVO | EMC CONFIDENTIAL. ALL RIGHTS RESERVED. Storage for SMB and Distributed Enterprise PX SERIES.
Lecture 11: DMBS Internals
PARAID: The Gear-Shifting Power-Aware RAID Charles Weddle, Mathew Oldham, An-I Andy Wang – Florida State University Peter Reiher – University of California,
Exploiting Flash for Energy Efficient Disk Arrays Shimin Chen (Intel Labs) Panos K. Chrysanthis (University of Pittsburgh) Alexandros Labrinidis (University.
Emalayan Vairavanathan
Lecture 2 “Structure of computer” Informatics. Computer is  general purpose device that can be programmed to carry out a set of arithmetic or logical.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
1 CS 501 Spring 2006 CS 501: Software Engineering Lecture 22 Performance of Computer Systems.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
Inside your computer. Hardware Review Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Price Performance Metrics CS3353. CPU Price Performance Ratio Given – Average of 6 clock cycles per instruction – Clock rating for the cpu – Number of.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13 (Sec ): Ramakrishnan & Gehrke and Chapter 11 (Sec ): G-M et al. (R2) OR.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Optimizing I/O Performance for ESD Analysis Misha Zynovyev, GSI (Darmstadt) ALICE Offline Week, October 28, 2009.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
A Semi-Preemptive Garbage Collector for Solid State Drives
연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
A Presentation on Hard Disks By: Team 4 (HIS44): (1)Samarjyoti Das (972151) (2)Subhadeep Ghosh (986570) (3)Dipanjan Das (986510) (4)Sudhamayee Pradhan.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 Query Processing Exercise Session 1. 2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program.
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
GPU ProgrammingOther Relevant ObservationsExperiments GPU kernels run on C blocks (CTAs) of W warps. Each warp is a group of w=32 threads, which are executed.
Internal Parallelism of Flash Memory-Based Solid-State Drives
Cluster Status & Plans —— Gang Qin
Experience of Lustre at QMUL
Join Processing for Flash SSDs: Remembering Past Lessons
Solid State Disks Testing with PROOF
40% More Performance per Server 40% Lower HW costs and maintenance
Dimitrios Katsaros1 Jointly with:
Ping-Sung Yeh, Te-Hao Hsu Conclusions Results Introduction
Join Processing for Flash SSDs: Remembering Past Lessons
Unit 2 Computer Systems HND in Computing and Systems Development
Lecture 11: DMBS Internals
Overview Introduction VPS Understanding VPS Architecture
Repairing Write Performance on Flash Devices
2.C Memory GCSE Computing Langley Park School for Boys.
The 100 TB Sorting Competition - A Quick Review
Function of Operating Systems
Presentation transcript:

The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection for optimal performance while requiring a single merge pass  Parallel implementations utilize the 4 CPU threads  Overlapping of I/O and computation  Run Formation uses key extraction and radix sort  Two implementations: EcoSort (10 GB, 100 GB)  Bring overlapping to the limits  Allow independent tuning of more parameters DEMsort (1000 GB)  Developed by Sanders, Singler et al. at the Karlsruhe Institute of Technology  Won the 2009 Sort Benchmark in the categories MinuteSort and GraySort using a 200-node cluster  Efficient also on a single node  Allows in-place sorting, needed to sort 1000 GB with just 1024 GB of storage I/O and CPU utilization while sorting 10 GB: Pro:  Built from NAND flash memory chips  No mechanically moving parts  Good shock resistance  Low energy consumption  Higher throughput than HDDs Con:  Higher price and less capacity than today’s HDDs  Small block random writes are slow  Performance may degrade depending on access pattern  Properties vary depending on manufacturer, model, firmware: Results Winner of the Sort Benchmark 2009/2010 mid-year round in the JouleSort categories 10 GB, 100 GB and 1000 GB! Using low power hardware does not imply an increase in running time: in the 10GB and 100 GB category we beat previous results both in terms of energy consumption and running time. As a consequence of winning all three categories using a single machine, a new 100 TB JouleSort category was introduced for the 2010 Sort Benchmark. * The 2007 results for the 1000 GB category were achieved on regular server hardware, not a low energy machine. So we cannot compete in terms of running time, only in energy consumption. Energy-Efficient Sorting using Solid State Disks MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation Ulrich Meyer Goethe University Andreas Beckmann Goethe University JouleSort Hardware Selection Rivoire, Shah, Ranganathan, Kozyrakis Stanford University and HP Labs Beckmann, Meyer, Sanders, Singler Goethe University and Karlsruhe Institute of Technology Intel Core 2 Duo T7600 (Mobile CPU) 2 cores, 2 threads, 1.66 GHz ProcessorIntel Atom cores, 4 threads, 1.6 GHz 2 GBMemory4 GB 2 PCI-e Disk Controllers (8+4 SATA) 1 SATA (onboard) I/O4 x SATA 3.0 Gb/s (onboard) 13 x Hitachi Travelstar 5K GB Notebook HDD Disks4 x SuperTalent FTM56GX25H 256 GB SSD Linux XFS on Linux Software Raid (Striping) OS File System Linux XFS on Linux Software Raid (Striping) NSort (commercial sorter)SoftwareEcoSort, DEMsort using STXXL 59 W 100 W Power Idle Power Loaded 25 W 37 W 2007 JouleSort Winner 10 GB, 100 GB The Benchmark  Sort 100 byte records with a 10 byte key  Introduced 1985, starting with 100 MB  New categories added targeting Speed/Size/Throughput (GraySort) Time (MinuteSort) Cost Efficiency (PennySort) Energy Efficiency (JouleSort, 2007) 10 GB, 100 GB, 1000 GB Sorting large data sets  Is easily described  Has many applications  Stresses both CPU and the I/O system Energy Efficiency  Energy (and cooling) is a significant cost factor in data centers  Energy consumption correlates to pollution Size [GB] Time [s] Energy [kJ] Rec./JTime [s] Energy [kJ] Rec./JEnergy Saving Factor *2920*