Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.

Slides:



Advertisements
Similar presentations
Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.
Advertisements

Maurice Herlihy (DEC), J. Eliot & B. Moss (UMass)
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Transactional Memory Part 1: Concepts and Hardware- Based Approaches 1Dennis Kafura – CS5204 – Operating Systems.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007.
Transactional Memory: Architectural Support for Lock- Free Data Structures Herlihy & Moss Presented by Robert T. Bauer.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
The University of Adelaide, School of Computer Science
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
CS510 Concurrent Systems Class 1b Spin Lock Performance.
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Cache Organization of Pentium
Multiprocessor Cache Coherency
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Transactional Memory CDA6159. Outline Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional.
1 Hardware Transactional Memory (Herlihy, Moss, 1993) Some slides are taken from a presentation by Royi Maimon & Merav Havuv, prepared for a seminar given.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
1 Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Process Synchronization.  Whenever processes share things, there is a chance for screwing up things  For example ◦ Two processes sharing one printer.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.
Multi-Object Synchronization. Multi-Object Programs What happens when we try to synchronize across multiple objects in a large program? – Each object.
Advanced Operating Systems (CS 202) Transactional memory Jan, 27, 2016 slide credit: slides adapted from several presentations, including stanford TCC.
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Cache Organization of Pentium
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Lecture 21 Synchronization
Lecture 19: Coherence and Synchronization
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Multiprocessor Cache Coherency
Example Cache Coherence Problem
The University of Adelaide, School of Computer Science
Designing Parallel Algorithms (Synchronization)
Lecture 5: Snooping Protocol Design Issues
Lecture 21: Synchronization and Consistency
Part 1: Concepts and Hardware- Based Approaches
Lecture: Coherence and Synchronization
Software Transactional Memory Should Not be Obstruction-Free
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Advanced Operating Systems (CS 202) Memory Consistency and Transactional Memory Feb. 6, 2019.
CSE 542: Operating Systems
CSE 542: Operating Systems
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Transactional Memory Yujia Jin

Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed by a higher priority process Convoy Effect –When lock holder is interrupted, other is forced to wait Deadlock –Circular dependence between different processes acquiring locks, so everyone just wait for locks

Lock-free Shared data structure is lock-free if its operations do not require mutual exclusion - Will not prevent multiple processes operating on the same object + avoid lock problems - Existing lock-free techniques use software and do not perform well against lock counterparts

Transactional Memory Use transaction style operations to operate on lock free data Allow user to customized read-modify- write operation on multiple, independent words Easy to support with hardware, straight forward extensions to conventional multiprocessor cache

Transaction Style A finite sequence of machine instruction with –Sequence of reads, –Computation, –Sequence of write and –Commit Formal properties –Atomicity, Serializability (~ACID)

Access Instructions Load-transactional (LT) –Reads from shared memory into private register Load-transactional-exclusive (LTX) –LT + hinting write is coming up Store-transactional (ST) –Tentatively write from private register to shared memory, new value is not visible to other processors till commit

State Instructions Commit –Tries to make tentative write permanent. –Successful if no other processor read its read set or write its write set –When fails, discard all updates to write set –Return the whether successful or not Abort –Discard all updates to write set Validate –Return current transaction status –If current status is false, discard all updates to write set

Typical Transaction /* keep trying */ While ( true ) { /* read variables */ v1 = LT ( V1 ); …; vn = LT ( Vn ); /* check consistency */ if ( ! VALIDATE () ) continue; /* compute new values */ compute ( v1, …, vn); /* write tentative values */ ST (v1, V1); … ST(vn, Vn); /* try to commit */ if ( COMMIT () ) return result; else backoff; }

Warning… Not intended for database use Transactions are short in time Transactions are small in dataset

Idea Behind Implementation Existing cache protocol detects accessibility conflicts Accessibility conflicts ~ transaction conflicts Can extended to cache coherent protocols –Includes bus snoopy, directory

Bus Snoopy Example processor Regular cache byte lines Direct mapped Transaction cache 64 8-byte lines Fully associative bus Caches are exclusive Transaction cache contains tentative writes without propagating them to other processors

Transaction Cache Cache line contains separate transactional tag in addition to coherent protocol tag –Transactional tag state: empty, normal, xcommit, xabort Two entries per transaction –Modification write to xabort, set to empty when abort –Xcommit contains the original, set to empty when commits Allocation policy order in decreasing favor –Empty entries, normal entries, xcommit entries Must guarantee a minimum transaction size

Bus Actions T_READ and T_RFO(read for ownership) are added for transactional requests Transactional request can be refused by responding BUSY When BUSY response is received, transaction is aborted –This prevents deadlock and continual mutual aborts –Can subject to starvation

Processor Actions Transaction active (TACTIVE) flag indicate whether a transaction is in progress, set on first transactional operation Transaction status (TSTATUS) flag indicate whether a transaction is aborted

LT Actions Check for XABORT entry If false, check for NORMAL entry –Switch NORMAL to XABORT and allocate XCOMMIT If false, issue T_READ on bus, then allocate XABORT and XCOMMIT If T_READ receive BUSY, abort –Set TSTATUS to false –Drop all XABORT entries –Set all XCOMMIT entries to NORMAL –Return random data

LTX and ST Actions Same as LT Except –Use T_RFO on a miss rather than T_READ –For ST, XABORT entry is updated

More Exciting Actions VALIDATE –Return TSTATUS flag –If false, set TSTATUS true, TACTIVE false ABORT –Update cache, set TSTATUS true, TACTIVE false COMMIT –Return TSTATUS, set TSTATUS true, TACTIVE false –Drops all XCOMMIT and changes all XABORT to NORMAL

Snoopy Cache Actions Regular cache acts like MESI invalidate, treats READ same as T_READ, RFO same as T_RFO Transactional cache –Non-transactional cycle: Acts like regular cache with NORMAL entries only –T_READ: If the the entry is valid (share), returns the value –All other cycle: BUSY

Simulation Proteus Simulator 32 processors Regular cache –Direct mapped, byte lines Transactional cache –Fully associative, 64 8-byte lines Single cycle caches access 4 cycle memory access Both snoopy bus and directory are simulated 2 stage network with switch delay of 1 cycle each

Benchmarks Counter –n processors, each increment a shared counter (2^16)/n times Producer/Consumer buffer –n/2 processors produce, n/2 processor consume through a shared FIFO –end when 2^16 items are consumed Doubly-linked list –N processors tries to rotate the content from tail to head –End when 2^16 items are moved –Variables shared are conditional –Traditional locking method can introduce deadlock

Comparisons Competitors –Transactional memory –Load-locked/store-cond (Alpha) –Spin lock with backoff –Software queue –Hardware queue

Counter Result

Producer/Consumer Result

Doubly Linked List Result

Conclusion Avoid extra lock variable and lock problems Trade dead lock for possible live lock/starvation Comparable performance to lock technique when shared data structure is small Relatively easy to implement