Maurice Herlihy and J. Eliot B. Moss,  ISCA '93

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

Maurice Herlihy (DEC), J. Eliot & B. Moss (UMass)
Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Transactional Memory Part 1: Concepts and Hardware- Based Approaches 1Dennis Kafura – CS5204 – Operating Systems.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007.
Transactional Memory: Architectural Support for Lock- Free Data Structures Herlihy & Moss Presented by Robert T. Bauer.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Nested Transactional Memory: Model and Preliminary Architecture Sketches J. Eliot B. Moss Antony L. Hosking.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood Presented by Colleen Lewis.
Transactional Memory CDA6159. Outline Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
1 Hardware Transactional Memory (Herlihy, Moss, 1993) Some slides are taken from a presentation by Royi Maimon & Merav Havuv, prepared for a seminar given.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo.
Concurrency unlocked Programming
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Advanced Operating Systems (CS 202) Transactional memory Jan, 27, 2016 slide credit: slides adapted from several presentations, including stanford TCC.
Siva and Osman March 7, 2000 Cache Coherence Schemes for Multiprocessors Sivakumar M Osman Unsal.
Multiprocessors – Locks
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Lecture 21 Synchronization
Part 2: Software-Based Approaches
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
CS 704 Advanced Computer Architecture
12.4 Memory Organization in Multiprocessor Systems
Reactive Synchronization Algorithms for Multiprocessors
Multiprocessor Cache Coherency
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.
Example Cache Coherence Problem
The University of Adelaide, School of Computer Science
Cache Coherence Protocols 15th April, 2006
Designing Parallel Algorithms (Synchronization)
CMSC 611: Advanced Computer Architecture
Multiprocessors - Flynn’s taxonomy (1966)
Part 1: Concepts and Hardware- Based Approaches
Multiprocessor and Thread-Level Parallelism-II Chapter 4
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Advanced Operating Systems (CS 202) Memory Consistency and Transactional Memory Feb. 6, 2019.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Transactional Memory: Architectural support for lock-free data structures Maurice Herlihy and J. Eliot B. Moss,  ISCA '93 Proceedings of the 20th annual international symposium on computer architecture Presented By: Ajithchandra Saya Virginia Tech

Outline WHY – The NEED WHAT – The solution My opinion Transactional memory – Idea Architectural support Cache coherency Mechanisms Cache line states Bus cycles Working Simulation Results Positives Extensions Current work Conclusions My opinion

WHY - NEED Increasing need for concurrent programs Growing shared data access Conventional locking mechanisms Priority Inversion Lock convoying Hard to write concurrent programs Data races Deadlocks

Solution LOCK-FREE SYNCRONIZATION

Transactional Memory Multiprocessor architecture support intended to make lock-free synchronization easy and efficient Extension of cache coherence protocols

IDEA TRANSACTIONS Finite sequence of machine instructions executed by a single process Satisfies two important properties - Serializability Atomicity Analogous to transactions in conventional database systems

IDEA Contd… Transactional primitives For memory access - Load Transactional (LT) Load Transactional Exclusive (LTX) Store Transactional (ST) Read Sets, Write sets, Data sets For manipulating transactional states - Commit Abort Validate

IDEA Contd … Example A simple memory read and update LT VALIDATE ST COMMIT If (2) or (4) fail, try again from step (1)

Architectural support Extension to cache coherence protocol If access conflict can be avoided then transaction conflict can also be avoided Current cache coherence mechanisms – Snoopy cache (Bus based) Directory based (network based) Separate caches Regular cache – Direct mapped Transaction cache – Fully associative

Cache line states

Bus cycles

Snoopy cache working Memory responds to read cycles only if no cache responds Memory responds to write cycles Snoops bus address lines Ignores if not interested For regular cache – T_READ, READ - Returns value if valid -> valid If Reserved, Dirty -> valid For T_READ -> invalidates RFO/T_RFO – Returns value and invalidates

Snoopy cache working For transactional cache TSTATUS False: True: Behavior same as regular cache Ignore tags other than NORMAL True: T_READ – returns value Others – Returns BUSY

Working Transactional operation modifications are made to XABORT tags only Two copies of cached items XABORT XCOMMIT COMMIT Success Failure XCOMMIT Empty Normal XABORT

Working Contd… Processor flags Internally managed by the processor TACTIVE Indicates whether a transaction is active TSTATUS If transaction is active i.e. TACTIVE = true, then if TSTATUS True - transaction is ongoing False - transaction is aborted

Working Contd … Taking the previous example LT Operations – LT, VALIDATE, ST, COMMIT LT Data availability Action XABORT Nothing NORMAL Mark as XABORT Create another copy marked as XCOMMIT T_READ Two cached copies marked as XABORT and XCOMMIT BUSY Drop all XABORT Change XCOMMIT to NORMAL

Working Contd … VALIDATE – Returns TSTATUS flag ABORT True – Continues False - Sets TSTATUS = true TACTIVE = false ABORT COMMIT – Returns TSTATUS flag

Working Contd … New data entry to cache Replace EMPTY Replace NORMAL Replace XCOMMIT Replacing XCOMMIT should write back current data to memory XCOMMIT state is used to improve performance

Simulations Comparison 2 Software techniques 2 Hardware techniques TTS with exponential backoff Software queues 2 Hardware techniques LL/SC with exponential backoff Hardware queues SETUP Proteus simulator Number of processors 1 to 32 Simple benchmarks

Counting Benchmark

Producer/Consumer benchmark

Doubly linked list Benchmark

Positives Uses same cache coherence and control protocols The additional hardware support required only at primary cache Commit/abort are operations internal to cache Doesn’t require communicating with other processes or writing back to memory

Extensions Transactional cache size – software overflow handling Additional primitives for faster updates Smaller data sets and shorter durations Adaptive backoff techniques or hardware queues Memory consistency models

Current Work HTM Hardware transactional memory is dependent on cache coherency policies and platform’s architecture Unbounded Transactional memory HTM migration during process migration

Current Work Software Transactional Memory (STM) Strong vs. weak isolation Eager vs. lazy updates Eager vs. Lazy contention detection Contention manager Visible vs. invisible reads Privatization

Conclusions Lock-free synchronization to avoid issues with locking techniques It is easy and efficient as conventional locking techniques Competitive/Outperforms existing lock based techniques Uses current techniques to improve performance

Opinion Novel technique to avoid synchronization issues Requires little hardware modifications Extensions are useful to make it practically usable

Why did it take so long for transactional memory to catch the limelight ?? Authors coined the term in 1993 Paper cited 1726 times1 Citation graph Source: Maurice Herlihy: Transactional Memory Today. ICDCIT 2010: 1-12 1 Google scholar citation count

The Gartner Hype cycle Source: Maurice Herlihy: Transactional Memory Today. ICDCIT 2010: 1-12

References Maurice Herlihy: Transactional Memory Today. ICDCIT 2010: 1-12 Publications by author http://www.informatik.uni- trier.de/~ley/pers/hd/h/Herlihy:Maurice.ht ml Transactional Memory Online web page at: http://www.cs.wisc.edu/trans-memory/

Questions ???

Thank you