(C) 2001 Daniel Sorin Correctly Implementing Value Prediction in Microprocessors that Support Multithreading or Multiprocessing Milo M.K. Martin, Daniel.

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
Lecture 6: Multicore Systems
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Snoopy Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Race Directed Random Testing of Concurrent Programs KOUSHIK SEN - UNIVERSITY OF CALIFORNIA, BERKELEY PRESENTED BY – ARTHUR KIYANOVSKI – TECHNION, ISRAEL.
Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
CS6290 Synchronization. Synchronization Shared counter/sum update example –Use a mutex variable for mutual exclusion –Only one processor can own the mutex.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Synchronization and Consistency I Steve Ko Computer Sciences and Engineering University at Buffalo.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
Computer Architecture II 1 Computer architecture II Lecture 9.
Multiscalar processors
Evaluating Non-deterministic Multi-threaded Commercial Workloads Computer Sciences Department University of Wisconsin—Madison
Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
Consistency. Consistency model: –A constraint on the system state observable by applications Examples: –Local/disk memory : –Database: What is consistency?
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
(C) 2003 Mulitfacet ProjectUniversity of Wisconsin-Madison Revisiting “Multiprocessors Should Support Simple Memory Consistency Models” Mark D. Hill Multifacet.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 8: April 26, 2001 Simultaneous Multi-Threading (SMT)
SafetyNet Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CS533 Concepts of Operating Systems Jonathan Walpole.
CISC 879 : Advanced Parallel Programming Rahul Deore Dept. of Computer & Information Sciences University of Delaware Exploring Memory Consistency for Massively-Threaded.
Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:
Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke.
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
Elec/Comp 526 Spring 2015 High Performance Computer Architecture Instructor Peter Varman DH 2022 (Duncan Hall) rice.edux3990 Office Hours Tue/Thu.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interactions with Microarchitectures and I/O Copyright 2004 Daniel.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Speculative Lock Elision
Memory Consistency Models
The University of Adelaide, School of Computer Science
Memory Consistency Models
143a discussion session week 3
The University of Adelaide, School of Computer Science
Threads and Memory Models Hal Perkins Autumn 2011
The University of Adelaide, School of Computer Science
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Symmetric Multiprocessing (SMP)
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Improving Multiple-CMP Systems with Token Coherence
Mark D. Hill Multifacet Project ( Computer Sciences Department
Shared Memory Consistency Models: A Tutorial
Lecture 10: Consistency Models
Store Atomicity What does atomicity really require?
Memory Consistency Models
Race Conditions David Ferry CSCI 3500 – Operating Systems
The University of Adelaide, School of Computer Science
Dynamic Verification of Sequential Consistency
Lecture 21: Synchronization & Consistency
Operating System Overview
The University of Adelaide, School of Computer Science
Lecture 11: Consistency Models
Presentation transcript:

(C) 2001 Daniel Sorin Correctly Implementing Value Prediction in Microprocessors that Support Multithreading or Multiprocessing Milo M.K. Martin, Daniel J. Sorin, Harold W. Cain, Mark D. Hill, and Mikko H. Lipasti Computer Sciences Department Department of Electrical and Computer Engineering University of Wisconsin—Madison

slide 2 Big Picture Naïve value prediction can break concurrent systems Microprocessors incorporate concurrency –Multithreading (SMT) –Multiprocessing (SMP, CMP) –Coherent I/O Correctness defined by memory consistency model –Comparing predicted value to actual value not always OK –Different issues for different models Violations can occur in practice Solutions exist for detecting violations

slide 3 Outline The Issues –Value prediction –Memory consistency models The Problem Value Prediction and Sequential Consistency Value Prediction and Relaxed Consistency Models Conclusions

slide 4 Value Prediction Predict the value of an instruction –Speculatively execute with this value –Later verify that prediction was correct Example: Value predict a load that misses in cache –Execute instructions dependent on value-predicted load –Verify the predicted value when the load data arrives Without concurrency: simple verification is OK –Compare actual value to predicted Value prediction literature has ignored concurrency

slide 5 Memory Consistency Models Correctness defined by consistency model Rules about legal orderings of reads and writes –E.g., do all processors observe writes in the same order? Example: Sequential consistency (SC) –Simplest memory model –System appears to be multitasking uniprocessor Memory P1 P2 P3 Appearance of one memory operation at a time

slide 6 Outline The Issues The Problem –Informal example –Linked list code example Value Prediction and Sequential Consistency Value Prediction and Relaxed Consistency Models Conclusions

slide 7 Informal Example of Problem, part 1 Student #2 predicts grades are on bulletin board B Based on prediction, assumes score is 60 Grades for Class Student IDscore Bulletin Board B

slide 8 Informal Example of Problem, part 2 Professor now posts actual grades for this class –Student #2 actually got a score of 80 Announces to students that grades are on board B Grades for Class Student IDscore Bulletin Board B

slide 9 Informal Example of Problem, part 3 Student #2 sees prof’s announcement and says, “ I made the right prediction (bulletin board B), and my score is 60”! Actually, Student #2’s score is 80 What went wrong here? –Intuition: predicted value from future Problem is concurrency –Interaction between student and professor –Just like multiple threads, processors, or devices E.g., SMT, SMP, CMP

slide 10 Linked List Example of Problem (initial state) head A null 4260 null A.data B.data A.next B.next Linked list with single writer and single reader No synchronization (e.g., locks) needed Initial state of list Uninitialized node

slide 11 Linked List Example of Problem (Writer) head B null 4280 A Writer sets up node B and inserts it into list A.data B.data A.next B.next Code For Writer Thread W1: store mem[B.data] <- 80 W2: load reg0 <- mem[Head] W3: store mem[B.next] <- reg0 W4: store mem[Head] <- B Insert { Setup node

slide 12 Linked List Example of Problem (Reader) head ? null 4260 null Reader cache misses on head and value predicts head=B. Cache hits on B.data and reads 60. Later “verifies” prediction of B. Is this execution legal? A.data B.data A.next B.next Predict head=B Code For Reader Thread R1: load reg1 <- mem[Head] = B R2: load reg2 <- mem[reg1] = 60

slide 13 Why This Execution Violates SC Sequential Consistency –Simplest memory consistency model –Must exist total order of all operations –Total order must respect program order at each processor Our example execution has a cycle –No total order exists

slide 14 Trying to Find a Total Order What orderings are enforced in this example? Code For Writer Thread W1: store mem[B.data] <- 80 W2: load reg0 <- mem[Head] W3: store mem[B.next] <- reg0 W4: store mem[Head] <- B Code For Reader Thread R1: load reg1 <- mem[Head] R2: load reg2 <- mem[reg1] Setup node Insert { Setup node

slide 15 Program Order Code For Writer Thread W1: store mem[B.data] <- 80 W2: load reg0 <- mem[Head] W3: store mem[B.next] <- reg0 W4: store mem[Head] <- B Code For Reader Thread R1: load reg1 <- mem[Head] R2: load reg2 <- mem[reg1] Setup node Insert { Must enforce program order

slide 16 Data Order If we predict that R1 returns the value B, we can violate SC Code For Writer Thread W1: store mem[B.data] <- 80 W2: load reg0 <- mem[Head] W3: store mem[B.next] <- reg0 W4: store mem[Head] <- B Code For Reader Thread R1: load reg1 <- mem[Head] = B R2: load reg2 <- mem[reg1] = 60 Setup node Insert {

slide 17 Outline The Issues The Problem Value Prediction and Sequential Consistency –Why the problem exists –How to fix it Value Prediction and Relaxed Consistency Models Conclusions

slide 18 Value Prediction and Sequential Consistency Key: value prediction reorders dependent operations –Specifically, read-to-read data dependence order Execute dependent operations out of program order Applies to almost all consistency models –Models that enforce data dependence order Must detect when this happens and recover Similar to other optimizations that complicate SC

slide 19 How to Fix SC Implementations Address-based detection of violations –Student watches board B between prediction and verification –Like existing techniques for out-of-order SC processors –Track stores from other threads –If address matches speculative load, possible violation Value-based detection of violations –Student checks grade again at verification –Also an existing idea –Replay all speculative instructions at commit –Can be done with dynamic verification (e.g., DIVA)

slide 20 Outline The Issues The Problem Value Prediction and Sequential Consistency Value Prediction and Relaxed Consistency Models –Relaxed consistency models –Value prediction and processor consistency (PC) –Value prediction and weakly ordered models Conclusions

slide 21 Relaxed Consistency Models Relax some orderings between reads and writes Allows HW/SW optimizations Software must add memory barriers to get ordering Intuition: should make value prediction easier Our intuition is wrong …

slide 22 Processor Consistency Just like SC, but relaxes order from write to read Optimization: allows for FIFO store queue Examples of PC models: –SPARC Total Store Order –IA-32 Bad news –Same VP issues as for SC –Intuition: VP breaks read-to-read dependence order –Relaxing write-to-read order doesn’t change issues Good news –Same solutions as for SC

slide 23 Weakly Ordered Consistency Models Relax orderings unless memory barrier between Examples: –SPARC RMO –IA-64 –PowerPC –Alpha Subtle point that affects value prediction –Does model enforce data dependence order?

slide 24 Models that Enforce Data Dependence Examples: SPARC RMO, PowerPC, and IA-64 Code For Writer Thread W1: store mem[B.data] <- 80 W2: load reg0 <- mem[Head] W3: store mem[B.next] <- reg0 W3b: Memory Barrier W4: store mem[Head] <- B Code For Reader Thread R1: load reg1 <- mem[Head] R2: load reg2 <- mem[reg1] Memory barrier orders W4 after W1, W2, W3 Insert { Setup node

slide 25 Violating Consistency Model Simple value prediction can break RMO, PPC, IA-64 How? By relaxing dependence order between reads Same issues as for SC and PC

slide 26 Solutions to Problem 1.Don’t enforce dependence order (add memory barriers) –Changes architecture –Breaks backward compatibility –Not practical 2.Enforce SC or PC –Potential performance loss 3.More efficient solutions possible

slide 27 Models that Don’t Enforce Data Dependence Example: Alpha Requires extra memory barrier (between R1 & R2) Code For Writer Thread W1: store mem[B.data] <- 80 W2: load reg0 <- mem[Head] W3: store mem[B.next] <- reg0 W3b: Memory Barrier W4: store mem[Head] <- B Code For Reader Thread R1: load reg1 <- mem[Head] R1b: Memory Barrier R2: load reg2 <- mem[reg1] Insert { Setup node

slide 28 Issues in Not Enforcing Data Dependence Works correctly with value prediction –No detection mechanism necessary –Do not need to add any more memory barriers for VP Additional memory barriers –Non-intuitive locations –Added burden on programmer

slide 29 Summary of Memory Model Issues SC Relaxed Models Weakly Ordered Models PC IA-32 SPARC TSO Enforce Data Dependence NOT Enforce Data Dependence IA-64 SPARC RMO Alpha

slide 30 Could this Problem Happen in Practice? Theoretically, value prediction can break consistency Could it happen in practice? Experiment: –Ran multithreaded workloads on SimOS –Looked for code sequences that could violate model Result: sequences occurred that could violate model

slide 31 Conclusions Naïve value prediction can violate consistency Subtle issues for each class of memory model Solutions for SC & PC require detection mechanism –Use existing mechanisms for enhancing SC performance Solutions for more relaxed memory models –Enforce stronger model