Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:

Slides:



Advertisements
Similar presentations
COMP375 Computer Architecture and Organization Senior Review.
Advertisements

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.
Computer Organization and Architecture
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 ILP, cont. Maintaining Sequential Appearance –Precise Interrupts –RUU approach to OoO Scheduling.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 24, 2003 Topic: Pipelining -- Intermediate Concepts (Multicycle Operations;
Chapter 12 Pipelining Strategies Performance Hazards.
Chapter 12 CPU Structure and Function. Example Register Organizations.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 2: Pipeline problems & tricks dr.ir. A.C. Verschueren Eindhoven.
What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Lecture 5: Pipelining Implementation Kai Bu
Instruction Issue Logic for High- Performance Interruptible Pipelined Processors Gurinder S. Sohi Professor UW-Madison Computer Architecture Group University.
Virtual Memory Expanding Memory Multiple Concurrent Processes.
Computer Architecture Lecture 2 System Buses. Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
CS203 – Advanced Computer Architecture ILP and Speculation.
Dynamic Scheduling Why go out of style?
/ Computer Architecture and Design
CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions
CS203 – Advanced Computer Architecture
Module: Handling Exceptions
Exceptions & Multi-cycle Operations
Pipelining: Advanced ILP
Lecture 6: Advanced Pipelines
Superscalar Processors & VLIW Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Module 2: Computer-System Structures
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Process Description and Control
COMS 361 Computer Organization
Project Instruction Scheduler Assembler for DLX
Adapted from the slides of Prof
Computer Architecture
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Module 2: Computer-System Structures
Overview What are pipeline hazards? Types of hazards
Interrupt handling Explain how interrupts are used to obtain processor time and how processing of interrupted jobs may later be resumed, (typical.
Module 2: Computer-System Structures
Module 2: Computer-System Structures
Lecture 9: Dynamic ILP Topics: out-of-order processors
CMSC 611: Advanced Computer Architecture
Presentation transcript:

Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:

What will be covered? Interrupts in pipelined processors The methods to implement precise interrupt in pipelined processors The performance evaluation of those methods Extension Architecture solution

Interrupt Interrupt: Stop and resume. Precise interrupt: Imprecise interrupt:

Types of interrupts I/O device request Invoking an operating system service from a user program Tracing instruction execution Breakpoint (programmer-requested interrupt) Integer arithmetic overflow FP arithmetic anomaly Page fault Misaligned memory access Memory protection violation Using an undefined or unimplemented instruction Hardware malfunctions Power failure

Example code StatementCommentsExecution Time 0 R2  0 Init. Loop index 1 R0  0 Init. Loop count 2 R5  1 Loop inc. value 3 R7  100 Maximum loop count 4Loop: R1  (R2 + A) Load A(I)11 clock cycles 5 R3  (R2 + B) Load B(I)11 clock cycles 6 R4  R1 + fR3 Floating add6 clock cycles 7 R0  R0 + R5 Inc. loop count2 clock cycles 8 (R0 + C)  R4 Store C(I) 9 R2  R2 + R5 Inc. loop index2 clock cycles 10P = Loop:R0 != R7Cond. Branch not equal

Interrupt in sequential model processors pc=5,R1,R2,R0,R5,R7 pc=6,R3,R1,R2,R0,R5,R7 pc=7,R4,R3,R1,R2,R0,R5,R7Interrupt occurs XX 1. Keep pc=7,R4, R3,R1,R2,R0,R5,R7, 2. Program suspended Interrupt program running Interrupt program stop 1. restore pc=7,R4, R3,R1,R2,R0,R5,R7, 2. Program resume pc=8,R4,R3,R1,R2,R0,R5, R7, 4R1  (R2+A) 5R3  (R2+B) 6R4  R1+fR3 110R2  44 … 150R0  100 7R0  R0+R5 In sequential model processors, the interrupt is precise. It guarantees suspended program can be resumed.

Interrupt in pipelined processors pc=5,R1,R2,R0,R5,R7 pc=6,R3,R1,R2,R0,R5,R7 Interrupt occurs XX 1. Keep pc=8,R3, R1, R2,R0,R5,R7 2. Program suspended Interrupt program running Interrupt program stop 1. restore pc=8,R3,R1, R2,R0,R5,R7 2. Program resume pc=8,R3,R1,R2,R0,R5,R7, 4R1  (R2+A) 5R3  (R2+B) 110R2  44 … 150R0  100 6R4  R1+fR3 7R0  R0+R5 8 (R0+C)  R4 R4 isn’t available In pipelined processors the interrupt could be imprecise, It does not guarantee suspended program can be resumed. 9 R2  R2 + R5

Preliminaries Model Architecture Register-register architecture Load: R i = (R j +disp.) Store: (R i +disp.) = R j Function: R i = R j op R k / R i = op R k Condition: P = disp: Ri op R k Process state General purpose registers Main memory Program counter (PC)

Interrupts Prior to Instruction Issue Before an instruction is issued, the interrupt occurs. The instruction issuing is halted. And wait a while until all previously issued instructions complete.

Precise Interrupts Methods in pipelined processors In-Order Instruction Completion Reorder Buffer Reorder Buffer with Bypass paths History Buffer Future File

In-Order Instruction Completion Instructions modify the process state only when all previously issued instructions are known to be free of exception conditions.

In-Order Instruction Completion - result shift register

In-Order Instruction Completion - result shift register (cont ’ )

In-Order Instruction Completion - process state modification Registers Main memory Program Counter

out-of-order instruction completion methods Limitation of in-order completion: Fast instructions may sometimes get held up even if there is no dependency. Further block other instructions. 6. R4  R1 + fR3Floating add6 clock periods 7. R0  R0 + R5Inc. loop count2 clock periods Methods to allow out-of-order completion. Basic reorder buffer, Reorder buffer with bypass paths. History buffer, future file.

Basic reorder buffer method: Organization. Reorder buffer: Separate the process of completing instructions from instruction commit out-of-order completion. In-order commit. Reorder buffer is used to rearrange instructions before they commit.

Basic reorder buffer method: Structure. Result shift register TAG field will guide result and exception conditions to reorder buffer. Reorder buffer Tail: when an instruction issues, create one entry. Head: when it contains valid result, check and remove. Example: Two instructions ’ relative positions in the two buffers.

Basic reorder buffer: Keep precise process state Keep register value precise: No exception at the head: results are written to register file. Exception at the head: issue is stopped to process interrupt and no further writes to register file. Keep memory precise: Hold stores in the issue register until all previous instructions are known to be free of exceptions. Stores are issued. An dummy entry is put to reorder buffer. Keep program counter precise: Program counter is stored in one field of reorder buffer as instructions are issued.

Reorder buffer with bypass paths Limitation of basic reorder buffer. Operands are held in reorder buffer. Instructions dependent on operands can not be issued. Reorder buffer with bypass paths is proposed. Bypass paths are provided from reorder buffer to register file output latches.

Reorder buffer with bypass paths: precise process state. Keep precise register: Operands are not actually written to register file but to register file output latches. Register will not be modified until the instruction reaches the head of reorder buffer. Keep precise memory and PC Same as before

Methods to reduce bypass circuit. Limitation of reorder buffer with bypass paths : The number of bypass comparators and the amount of circuitry for multiple bypass check. History buffer, future file are proposed. Basic idea: place computed results in a working register file, but retain enough state information so a precise state can be restored.

History buffer: organization History buffer: Instruction issues: The current value of the destination register is stored to history buffer entry. Instruction completes: Results on the result bus are written directly into register file.

History buffer: Keep precise process state Keep register precise Tag field is used to guide exception to history buffer. Old values are kept when instruction issue. No exception: head is removed. Keep memory and PC precise Same as before Example: Old value in entry 4, 5.

Future file Similar to the history buffer method. Keep register precise:two register files. Architecture file: Future file: Keep memory and PC precise. Same as before.

Performance evaluation: Environment: CRAY-1S simulation system. The first 14 Lawrence Livermore loops are used as simulation workload. Five methods are classified as three groups: In-order completion. Simple reorder buffer Reorder buffer with bypass, history buffer, future file. Two evaluation cases based on different methods to handle store.

Performance evaluation(1) Measure condition: store blocked until the results pipeline is empty. In-order completion is independent on the number of entries. In-order completion is better if buffer is small. If the number of entries increases beyond 3,the other two are better.

Performance evaluation(2) Measure condition: Stores are issued and held in the memory pipeline. Second method to handle store offers a clear improvement over first method. Performance degradation for eight-entry reorder buffer with bypass paths is only 3 percent.

Indication from the methods. If the entries in the reorder buffer exceed a certain value, the performance will not be improved. In both of the two tables, the number is eight. Tradeoff between performance degradation and cost of implementing a method.

Outline Extensions Architectural Solutions Summary

Extensions Additional state information Virtual memory Cache memory

Other State Values State register Condition codes:

Virtual Memory Load/store instructions pass through the address translation section in order reserve time slots in the result pipeline and/or reorder buffer If addressing fault, the instruction and all subsequent load/store are cancelled

Virtual Memory: Using ROB Send the page fault to reorder buffer. Guide load/store to the correct reorder buffer using tag. Entry removed while reaching the head. Exception causes all further entries discarded.

Cache Memory Store-Through cache Write-Back cache

Architectural Solutions Freeze and dump Save program counters Save a sequence of instructions

Summary In-order instruction completion Reorder buffer Bypass paths, History buffer and Future file Extensions Architecture solutions

References: