Darko Makreshanski Department of Computer Science ETH Zurich

Slides:



Advertisements
Similar presentations
HW/Study Guide. Synchronization Make sure you understand the HW problems!
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Improving Database Performance on Simultaneous Multithreading Processors Jingren Zhou Microsoft Research John Cieslewicz Columbia.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Memory Management (II)
Cache Conscious Indexing for Decision-Support in Main Memory Pradip Dhara.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
CS533 - Concepts of Operating Systems 1 Class Discussion.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Locking Key Ranges with Unbundled Transaction Services 1 David Lomet Microsoft Research Mohamed Mokbel University of Minnesota.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
By: Sang K. Cha, Sangyong Hwang, Kihong Kim and Kunjoo Kwon
November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
8.1 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Paging Physical address space of a process can be noncontiguous Avoids.
State Teleportation How Hardware Transactional Memory can Improve Legacy Data Structures Maurice Herlihy and Eli Wald Brown University.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
IN-MEMORY OLTP By Manohar Punna SQL Server Geeks – Regional Mentor, Hyderabad Blogger, Speaker.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Technology from seed Exploiting Off-the-Shelf Virtual Memory Mechanisms to Boost Software Transactional Memory Amin Mohtasham, Paulo Ferreira and João.
Concurrency unlocked Programming
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
A Study of Data Partitioning on OpenCL-based FPGAs Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST) 1.
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Failure-Atomic Slotted Paging for Persistent Memory
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Indexing Goals: Store large files Support multiple search keys
PHyTM: Persistent Hybrid Transactional Memory
Atomic Operations in Hardware
Faster Data Structures in Transactional Memory using Three Paths
Cache Memory Presentation I
SQL 2014 In-Memory OLTP What, Why, and How
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Yiannis Nikolakopoulos
Getting to the root of concurrent binary search tree performance
File Organization.
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

To Lock, Swap or Elide: On the Interplay of Hardware Transactional Memory and Lock-free Indexing Darko Makreshanski Department of Computer Science ETH Zurich Justin Levandoski Microsoft Research Redmond Ryan Stutsman Microsoft Research Redmond

Motivation Hardware Transactional Memory Proposed as hardware support for lock-free data-structures [1] Introduced in Intel Haswell (2013) Existing Lock-free data-structures Relying on CPU atomic primitives (CAS, FAI) Notoriously difficult to get right [1] Transactional Memory: Architectural Support for Lock-Free Data Structures, M. Herlihy, J. E. B. Moss, ISCA ‘93

Lock-free Programming Hardware Transactional Memory

Overview Q1: Does HTM obviate the need for crafty lock-free designs? A1: No. Technical limitations prohibit use of HTM as a general purpose solution. Q2: What if all technical limitations are overcome? A2: No. There are still important fundamental differences. Q3: Can lock-free data-structures benefit from HTM? A3: Yes. Using HTM for MW-CAS can simplify lock-free designs

Hardware Transactional Memory Sequence of instructions with ACI(D) properties Programming Model: Lock Elision: If (BeginTransaction()) Then < Critical Section > CommitTransaction() Else < Abort Fallback Codepath > EndIf AcquireElidedLock() < Critical Section > ReleaseElidedLock() Transaction buffers stored in core-local (L1) cache Conflict-detection and ensuring atomicity piggyback on cache-coherence protocol

Bw-Tree1 (A Lock-free B-Tree) Mapping Table Page A Address A B Page B Page C Page D C D Logical pointer Physical pointer [1] The Bw-Tree: A B-tree for New Hardware. Levandoski, Lomet, Sengupta. ICDE ‘13

Bw-Tree1 (Lock-free Updates) Δ: Update record 35 Δ: Insert Record 60 Mapping Table Δ: Delete record 48 Address Δ: Insert record 50 P Page P Consolidated Page P [1] The Bw-Tree: A B-tree for New Hardware. Levandoski, Lomet, Sengupta. ICDE ‘13

Overview Q1: Does HTM obviate the need for crafty lock-free designs? Q2: What if all technical limitations are overcome? Q3: Can lock-free data-structures benefit from HTM?

HTM Parallelized B-Tree Q1: Does HTM obviate the need for crafty lock-free designs? HTM Parallelized B-Tree Wrap individual tree operations in a transaction Effortless parallelization of existing single-threaded implementations State-of-the-art in using HTM for database indexing [1,2] Using the Google B-Tree implementation [3] In-memory single-threaded B-Tree [1] Exploiting Hardware Transactional Memory in Main-Memory Databases. V. Leis, A. Kemper, T. Neumann. ICDE 2014 [2] Improving In-Memory Database Index Performance with Intel®Transactional Synchronization Extensions Karnagel et al. HPCA 2014 [3] https://code.google.com/p/cpp-btree/

HTM Parallelized B-Tree Q1: Does HTM obviate the need for crafty lock-free designs? HTM Parallelized B-Tree Works well for simple use-cases Small key and payload sizes 8B Keys, 8B Payloads 4M Key-Payload pairs Random read-only workload

HTM Parallelized B-Tree Q1: Does HTM obviate the need for crafty lock-free designs? HTM Parallelized B-Tree Transaction size limited by cache size. (32KB L1 cache, 8-way associativity) Sensitive to payload size Even more sensitive to key size Sensitive to tree size Hyper-threading

Overview Q1: Does HTM obviate the need for crafty lock-free designs? Q2: What if all technical limitations are overcome? Q3: Can lock-free data-structures benefit from HTM?

Lock-free vs HTM Q2: What if all technical limitations are overcome? Lock-free Bw-Tree and HTM both offer optimistic concurrency control HTM-parallelized data-structures can also provide lock-freedom Can HTM be seen as a hardware-accelerated version of lock-free algorithms? Fundamental difference: Lock-free (Bw-Tree) -> copy-on-write (MVCC-like) Transactional memory -> atomic update in-place (2PL-like) Different behavior under read-write contention

Read-write Contention Q2: What if all technical limitations are overcome? Read-write Contention Workload A Workload B Experimental Setup 4 read-only point lookup threads 0-4 write-only point update threads Zipfian skew (s = 2) Workload A Fixed-length 8-byte keys & payload Workload B Variable length (30-70 byte keys) 256-byte payloads

Overview Q1: Does HTM obviate the need for crafty lock-free designs? Q2: What if all technical limitations are overcome? Q3: Can lock-free data-structures benefit from HTM?

HTM-enabled Lock-free B-Tree Q3: Can lock-free data-structures benefit from HTM? HTM-enabled Lock-free B-Tree Bw-Tree Problem: Code complexity Structure modification operations (SMOs) such as page split, merge require multi-word CAS Bw-Tree separates SMOs into multiple sub-operations Reasoning about all possible race-conditions is hard Use HTM as hardware support for multi-word compare-and-swap SMOs can be installed in a single operation Small transaction footprint -> avoid capacity problems

Conclusion Does HTM obviate the need for crafty lock-free designs? No. Technical limitations prohibit use of HTM as a general purpose solution. What if all technical limitations are overcome? No. There are still important fundamental differences. Can lock-free data-structures benefit from HTM? Yes. Using HTM for MW-CAS can simplify lock-free designs

Conclusion