Concurrency unlocked Programming

Concurrency unlocked Programming
Transactional Memory Concurrency unlocked Programming Bingsheng Wang– TM – Operating Systems

Bingsheng Wang – TM – Operating Systems
Outline Background Motivation Database Transaction Transactional Memory History Transactional Memory Example Mechanisms Software Transactional Memory Hardware Transactional Memory Application Bingsheng Wang – TM – Operating Systems

Background Around 2004, 50 years of exponential improvement in the performance of sequential computers end, due to power and cooling concerns In the terminology of Intel’s founder, Andrew Grove, this is an inflection point—a “time in the life of a business when its fundamentals are about to change” Multiprocessor means that the performance of a well-formulated parallel program will also continue to improve at roughly Moore’s law rate. Bingsheng Wang – TM – Operating Systems

Motivation Difficulty of Multiprocessor Programming: Parallelism and Nondeterminacy Prune nondeterminism Violation(atomicity-violation, order-violation) Deadlock Parallelism programming (Multi-thread programming) lacks comparable abstraction mechanisms. For example, multi-thread programming Facing with so many difficulties, it’s urgent to provide a solution to parallel programming Bingsheng Wang – TM – Operating Systems

Database Transaction A transaction specifies a program semantics in which a computation executes as if it was the only computation accessing the database. Other computations may execute simultaneously, but the model restricts the allowable interactions among the transactions. Offer an abstraction mechanism in database systems for reusable parallel computations. ACID (Atomic, Consistent, Isolated and Durable) Atomic: All or nothing Consistent: the results must conform to existing constraints in the database Isolated: isolate each transaction from other transactions Durable: the results must be written to durable storage Bingsheng Wang – TM – Operating Systems

Transactional Memory History
In 1977, Lomet proposed the idea to map database transaction into programming language, but no implementation In 1993, Herlihy and Moss proposed hardware supported transactional memory In 1993, Stone et al. proposed an atomic multi-word operation known as “Oklahoma Update” In recent years, a huge ground swell of interest in both hardware and software systems for implementing transactional memory. et cetera et-'se-tə-rə Bingsheng Wang – TM – Operating Systems

Transactional Memory “Steal” ideas from database transaction. (A transaction is a sequence of actions that appears indivisible and instantaneous to an outside observer.) The properties of transactions provide a convenient abstraction for coordinating concurrent reads and writes of shared data in a concurrent or parallel system. Bingsheng Wang – TM – Operating Systems

Example val Bingsheng Wang – TM – Operating Systems

Example double-ended queue Bingsheng Wang – TM – Operating Systems

Mechanisms of Transactional Memory
Manage the tentative work that a transaction does while it executes, akin the left and right fields of the qn object. Ensure isolation between transactions, detect and resolve conflicts “Eager conflict detection”: identifies conflicts when running. “Lazy conflict detection”: detect only when they try to commit. Bingsheng Wang – TM – Operating Systems

Integrated with high-level programming language
This approach eliminates a lot of the boilerplate associated with using TM directly: the programmer identifies the sections of code that should execute atomically, and the language implementation introduces operations such as ReadTx and WriteTx where necessary. Chapter 3 also discusses some of the key challenges in integrating TM into existing software tool-chains and, in particular, it looks at the question of how TM can coexist with other programming abstractions Atomic blocks Bingsheng Wang – TM – Operating Systems

Software Transactional Memory
A rich vein of research over the last decade, focusing on approaches that can be implemented on current mainstream processors. There is a very large space of different possible STM designs, and simple version-number-based approach is by no means the state of the art. When comparing alternative STMsystems, it is useful to distinguish different properties that they might have. Some STM systems aim for low sequential overhead (in which code running inside a transaction is as fast as possible), others aim for good scalability (in which a parallel workload using transactions can improve in performance as processors are added) or for strong progress guarantees (e.g., to provide nonblocking behavior). STMsystems also differ substantially in terms of the programming semantics that they offer—e.g., the extent to which memory locations may be accessed transactionally at some times and non-transactionally at other times. In addition, when evaluating TM systems, we must consider that software implementations on today’s multiprocessor systems execute with much higher interprocessor communication latencies, whichmay favor computationally expensive approaches that incur less synchronization or cache traffic. Future systems may favor other trade offs. For instance, in the context of reader-writer locks, Dice and Shavit have shown how CMPs favor quite different design choices from traditional multiprocessors [86]. Differences in the underlying computer hardware can greatly affect the performance of an STM system—e.g., the memory consistency model that the hardware provides and the cost of synchronization operations when compared with ordinary memory accesses. Bingsheng Wang – TM – Operating Systems

Software Transactional Memory
Record Version number into read-log Record the change into write-log and undo-log there is a very large space of different possible STMdesigns, and this simple version-number-based approach is by no means the state of the art. Performance Limit Check conflicts based on version number and decide commit or abort. If commit, increase the version number. Bingsheng Wang – TM – Operating Systems

Hardware Transactional Memory
Provide advantages over STM Lower overheads Better power and energy profiles Less invasive Strong isolation Identifying Transactional Locations: Via extensions to instruction set Explicitly transactional HTMs: more flexible Implicitly transactional HTMs: reuse Tracking Read-Sets and Managing Write-Sets Extend the mechanisms of current caches and buffers to track read-sets and manage write-sets Explicitly transactional HTMs provide software with new memory instructions to indicate which memory accesses should be made transactionally (e.g., load_transactional and store_transactional), and they may also provide instructions that start and commit transactions (e.g., begin_transaction and end_transaction).Other memory locations accessed within the transaction through ordinary memory instructions do not participate in any transactional memory protocol. Track Read-Sets Most proposals based on cache extensions add an additional bit, the read bit, to each line in the data cache.A transactional read operation sets the bit, identifying the cache line as being part of the transaction’s read-set. This is a fairly straightforward extension to a conventional data cache design. Such designs also support the capability to clear all the read bits in the data cache instantaneously. Hardware caches enableHTMs to achieve low overhead read-set tracking; however, they also constrain the granularity of conflict detection to that of a cache line. Additional access bits may be added to reduce the granularity for tracking. Manage Write-Sets Hardware caches can also be extended to track a transaction’s write-set.This extension requires the addition of a speculatively written state for the addresses involved (e.g., [34; 38; 66; 247]). Since data caches are on the access path for processors, the latest transactional updates are automatically forwarded to subsequent read operations within the transaction; no special searching is required to locate the latest update.However, if the data cache is used to track the write-set, then care is required to ensure that an only copy of a line is not lost—e.g., in systems where caches allow processors to directly write to a location in the cache without updating main memory. A requirement is that the now-dirty cache line must eventually be copied into main memory before it is overwritten by a transaction’s tentative updates. If we did not do this, we would lose the only copy of the cache line following an abort. a processor register is a small amount of storage available as part of a CPU or other digital processor Memory(Buffer)->Cache->Register Bingsheng Wang – TM – Operating Systems

Detecting Data Conflicts: Cache coherence protocol Cache state: MESI (modified/exclusive/shared/invalid) Resolving Data Conflicts Eager conflict detection: aborting the transaction on the processor that receives the conflicting request, and transferring control to software following such an abort. Managing Architectural Register State Create a shadow copy of architectural registers at the start of a transaction for restoration. Why manage register state: A program counter identifies the instruction that the processor is to fetch from memory; the processor starts executing the instruction; the instruction may access memory and may operate on registers in which data may be stored. When the instruction’s execution completes, the program counter is updated to identify the next instruction to execute. The register state at this point becomes part of the processor’s precise state for the retired instruction. This sequence also occurs within an HTM transaction. When a transaction aborts, the register state therefore must also be restored to a known precise state, typically corresponding to just before the start of the transaction. Solutions Some HTM systems rely on software to perform such a recovery while other HTM systems build on existing hardware mechanisms to recover architectural register state. One straightforward approach for hardware register recovery entails creating a shadow copy of the architectural registers at the start of a transaction. This can be maintained in an architectural register file (ARF). This operation can often be performed in a single cycle. An alternative approach can be possible in processors that already employ speculative execution to achieve parallelism and high performance. These processors rename the architectural registers into an intermediate name to allow multiple consumers and writers of these register names to execute in parallel, while maintaining the correct data dependencies and program order. The intermediate name is often part of a larger pool of registers, also referred to as physical registers to contrast them with architectural registers. A special table, called the register alias table (RAT), tracks the mapping between the architectural registers and physical registers at any given time. For thes Bingsheng Wang – TM – Operating Systems

Committing and Aborting HTM Transactions: making all transactional updates visible to other processors instantaneously Obtain the write permissions for all the transactional addresses Block any subsequent requests from other processors Drain the store buffer into the data cache Bingsheng Wang – TM – Operating Systems

Application Manage shared-memory data structures in which scalability is difficult to achieve via lock-based synchronization Some Examples Double-ended queue operations (PushLeft) Graph algorithms Quake game server Limitations The performance of code executing within a transaction can be markedly slower than the performance of normal code Parallel/Semantic errors Graph algorithms: the set of nodes that a thread accesses depends on the values that it encounters: with locks, a thread may need to be overly conservative (say, locking the complete graph before accessing a small portion of it), or a graph traversal may need to be structured with great care to avoid deadlocks if two threads might traverse the graph and need to lock nodes already in use by one another.WithTM, a thread can access nodes freely, and the TM system is responsible for isolating separate transactions and avoiding deadlock. Quake Game: When modeling the effect of a player’s move in the game, the lock based implementation needs to simulate the effect of the operation in order to determine which game objects it needs to lock. Having done that simulation, the game would lock the objects, check that the player’s move is still valid, and then perform its effects. WithTM, the structure of that code is simplified because the simulation step can be avoided. In parallel software, the programmer must still divide work into pieces that can be executed on different processors. It is still (all too) easy to write an incorrect parallel program, even with transactional memory. For example, a programmer might write transactions that are too short—e.g., in the Move example, they might place Insert and Remove in two separate transactions, rather than in one combined transaction. Conversely, a programmer might write transactions that are too long—e.g., they might place two operations inside a transaction in one thread, when the intermediate state between the operations needs to be visible to another thread. A programmer might also simply use transactions incorrectly—e.g., starting a transaction but forgetting to commit it. Bingsheng Wang – TM – Operating Systems

Overall Conclusion Transactional Memory model provides an unlocked solution and an abstraction mechanism to parallel programming. Software Transactional Memory: coexist with other programming abstractions and depends on the environment. Hardware Transactional Memory: Efficient but application sensitive. Transactional Memory is not a panacea. Bingsheng Wang – TM – Operating Systems

Reference Transactional Memory (Second Edition) Tim Harris, James Larus, Ravi Rajwar, Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, 2010 Kunle Olukotun and Lance Hammond. The future of microprocessors. Queue, 3(7):26–29, DOI: / A. S. Grove. Only the paranoid survive. Doubleday, E. A. Lee, "The Problem with Threads," in IEEE Computer, 39(5):33-42, May 2006 as well as an EECS Technical Report, UCB/EECS , January 2006. Learning from Mistakes --- A Comprehensive Study on Real World Concurrency Bug Characteristics. Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 13th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS'08). Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lockfree data structures. In ISCA ’93: Proc. 20th Annual International Symposium on Computer Architecture, pages 289–300, May DOI: / , 17, 149, 155 David B. Lomet. Process structuring, synchronization, and recovery using atomic actions. In ACM Conference on Language Design for Reliable Software, pages 128–137, March DOI: / , 62 Janice M. Stone, Harold S. Stone, Phil Heidelberger, and John Turek. Multiple reservations and the Oklahoma update. IEEE Parallel & Distributed Technology, 1(4):58–71, November DOI: / , 149, 150, 158 Bingsheng Wang – TM – Operating Systems

Concurrency unlocked Programming

Similar presentations

Presentation on theme: "Concurrency unlocked Programming"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Concurrency unlocked Programming

Similar presentations

Presentation on theme: "Concurrency unlocked Programming"— Presentation transcript:

Similar presentations

About project

Feedback