We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byBraden Cousens
Modified about 1 year ago
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 ILP, cont. Maintaining Sequential Appearance –Precise Interrupts –RUU approach to OoO Scheduling
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Superscalar Processors: The Big Picture Program Form Processing Phase Static program dynamic inst. Stream (trace) execution window completed instructions Fetch and CT prediction Dispatch/ dataflow inst. Issue inst execution inst. Reorder & commit
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 A Generic Superscalar OOO Processor Pre-decode I-CACHE buffer Rename Dispatch scheduler Reorder buffer RF FUs Memory Interface
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Maintaining Sequential Semantics What if execution gets interrupted at an arbitrary point? –All insts. before commit –None thereafter We’ll focus on interrupts Same mechanisms used today to support SPECULATIVE EXECUTION “Definition”: Instr. executes speculatively up to complete. We don’t know yet if we should have executed this instr. Verification happens at commit (if ever).
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Interrupts Examples –Power Failing, Arithmetic Overflow –I/O Device Request, OS Call, Page Fault –Invalid Opcode, Breakpoint, Protection Viol. Aka Faults, Exceptions, or Traps Requirements –Surprise Jump (to vectored Address) –Linking Return Address –Saving State –Changing State (e.g., kernel mode)
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Classifying Interrupts 1a: Synchronous –Function of program state –overflow, page fault, etc. 1b. Asynchronous –e.g., External device or malfunction 2. Use Request –OS Call 2b. Coersed –From OS or hardware –page fault, protection violation 3a. User Maskable –Use can disable processing 3b. Non-Maskable –Guess!!!
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Classifying Interrupts, contd. 4a. Between Instructions –Usually Asynchronous 4b. Within an Instruction –Usually Synchronous –Harder to deal with, why??? 5a. Resume –As if nothing happened as far as the program is concerned 5b. Catastrophic –Say, bye bye, program is leaving us
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Restartable Pipelines Interrupts within an instruction are not catastrophic Most machines support this –Needed for virtual memory Some machines did not support this –Cost & Slowdown PRECISE INTERRUPTS is key –As if the interrupt happened at a well defined point in the original sequential order –First let’s consider a simple DLX-style pipeline
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Precise Interrupts Sequential Semantics Complete instructions before the offending instruction Squash (effects of) instructions after Save PC Force trap instruction into FETCH stage –divert execution to interrupt handler
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Precise Interrupts Jim Smith and Andrew Plezkun Paper Original work was for a “simple” pipeline Today the same principles are used in virtually all modern microprocessors –Support for SPECULATIVE EXECUTION executing instruction without knowing whether we should more on this later –and of course, precise interrupts We’ll stick to precise interrupts for the time being
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Do the Simple Thing First Modify State only when all preceding insts. are KNOWN to be exception free. Mechanism: Result Shift Register Stage = cycle At FETCH: Reserve all stages for the duration of the instruction
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Simple Solution Discussion Essentially In-Order Completion –Simple Easy to implement –Performance? Execution overlap still possible Writebacks in order Amplifies latencies Dependent Instructions wait longer
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Allowing out-of-order completes Add one more state for instruction execution: –COMPLETE & COMMIT COMPLETE: –Result calculated –Dependent instructions can use –BUT, don’t know if preceding instructions are all OK –I.e., don’t know if this instruction should have executed now based on the original program order COMMIT: –All preceding instructions executed with no problems –Can safely commit stage changes
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 OOO Complete & IO Commit Want: Out-of-Order Completion –Allow OOO completion –Maintain in-order COMMIT –Allow maximum overlap –Guarantee precise state if needed How does this improve performance? In-Order Complete OOO Complete Time DIV R3, _, _ ADD R1, _, _ ADD _, R1, _ In-order commits
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Reorder Buffer Result Shift Register: –Reserve Result Bus –Out-of-Order Completion Reorder Buffer –Defer Commits and do them in-order –Allow OOO Completes by buffering state motion res = result v = valid e = result NYA When to complete When to commit
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Reorder Buffer Complications State is kept in the reorder buffer Have to bypass from every entry –Need to determine latest write w/ respect to the consuming instruction RF RB Essentially: 1. In-Order Commits 2. Buffer speculative state till commit
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Speculative State Updates Two fundamental approaches – Do changes but keep a record of old state –Everything OK? Just discard record of changes HISTORY BUFFER – Keep two states: Architectural and Speculative On COMPLETE write state to Speculative On ISSUE read from speculative On COMMIT write to Architectural On Error, throw out Speculative state FUTURE FILE
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 History Buffer Allow out-of-order register file updates At decode record current value of target register in RB –notice that this is the previous value the register had On Commit? –Do nothing, state is fine On Exception? –Use History to UNDO changes made RF HB results Source operands Destination registers Exception
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 History Buffer Discussion Simple Mechanism Additional Register File Port Single Source for Input Operands Normal Instruction processing Not changed by much –Control mostly unchanged –Nothing to do on Commit for the common case Slow response to Interrupts –Need to scan through HB –Complex?
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Future File: The Optimist’s View Two Register Files: –One updated Out-of-Order (FUTURE) assume no exception will occur –One updated in Order (ARCHITECTURAL) Advantage: No delay to restore state on exception RF RB Source operands FF results
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 How These Relate to Register Renaming? Physical Registers provide sufficient storage for both speculative and architectural storage It’s the register map table that determines what is the current state On interrupt we have to restore the map table –Values are there in the physical register file History and Future approaches still valid –History: keep track of changes to register map table –On interrupt undo them one by one –Future: keep two tables Speculative: updated at decode Architectural: updated at commit
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 RUU Sohi’s Paper Common Mechanism for Precise Interrupts and OOO Execution Register Update Unit –A collection of Reservation stations –Organized as a FIFO queue –Instructions Enter In-order at FETCH –They Exit In-Order at COMMIT Register File updates happen at this point. Simplescalar follows this model –Well, mostly –Cut’s corners on when Completes become visible
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 RUU: OOO Execution Decode: –Check RUU for most recent write to register –If none found, read value from RF Do it in parallel really –If found, link to producer with a TAG RUU number is the TAG Issue –Wait till all input operands are ready Complete –Broadcast value and RUU ID Waiting instructions will pick value up Commit –Head and Tail pointer for FIFO operation –Only when everyone before has committed
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 Where is the Rename Table? It’s the RUU decode insts scan for the most recent update to register –If none found, then register in register file –Otherwise, get RUU entry # as tag Interrupts? –Simply flush RUU Pros/Cons: –Associative lookup for decode –RUU ports limit when consumers can read a value
Final touches on Out-of-Order execution Review Superscalar Looking back Looking forward.
Chapter 13 Instruction-Level Parallelism and Superscalar Processors.
Out-of-Order Execution & Register Renaming Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Asanovic/Devadas Spring.
Chapter 3 1 Process Description and Control Chapter 3.
Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition CPU Structure and Function.
Chapter 17: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Shadow Paging Recovery With Concurrent Transactions.
IOS103 OPERATING SYSTEM VIRTUAL MEMORY. Objectives At the end of the course, the student should be able to: Define virtual memory; Discuss the demand.
CSE502: Computer Architecture Out-of-Order Schedulers.
Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by.
Computer Systems Lecturer: Szabolcs Mikulas URL: Textbook: W. Stallings,
Interrupts and Exceptions COMS W6998 Spring 2010.
What is an Operating System? A program that acts as an intermediary between a user of a computer and the computer hardware. Operating system goals: Execute.
In-Order Execution In-order execution does not always give the best performance on superscalar machines. The following example uses in-order execution.
Spring 2003CSE P5481 Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with scoreboarding IBM 360/91 with Tomasulos algorithm.
1 CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania Fall 2003 Lecture Notes: Multiprocessors (updated version)
UNIT – IV VIRTUAL MEMORY MANAGEMENT Handled by K. Venkatesh & Razia Sultana.
Multithreaded Processors. Pipeline Hazards LW r1, 0(r2) LW r5, 12(r1) ADDI r5, r5, #12 SW 12(r1), r5 Each instruction may depend on the next – Without.
Design and Implementation Issues Today Design issues for paging systems Implementation issues Segmentation Next I/O.
SISTEMA EMBEBIDO. OPERATING SYSTEMS TYPES The Linux OS is monolithic. Generally operating systems come in three flavors: real-time executive, monolithic,
William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors.
Computer Architecture and Organization Computer Components and System Buses.
Advanced Operating Systems Prof. Muhammad Saeed Memory Management -II.
The Operating System and the Central Processing Unit 4 Although CS420 is not a hardware course, the OS and the CPU are highly interdependent and indeed.
Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium.
File Systems. Storing Information Applications can store it in the process address space Why is it a bad idea? –Size is limited to size of virtual address.
Chapter7. System Organization. System Organization - How computers and their major components are interconnected and managed at the system level. 7.1.
William Stallings Computer Organization and Architecture 8 th Edition Chapter 3 Top Level View of Computer Function and Interconnection.
CS 141 ChienMay 26, 1999 Interrupt-Driven Input/Output u Last Time –Busses, Arbitration, Interoperability –Polled Input/Output Operations –Inefficiencies.
Linux: The Guts By Sam Evans and John Massey. History Of Linux ❖ 1984: Richard Stallman quits his job at MIT, and starts working on the GNU project. ❖
1 Process Description and Control Chapter 2. 2 Process A program in execution An instance of a program running on a computer The entity that can be assigned.
© 2016 SlidePlayer.com Inc. All rights reserved.