ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

Slides:



Advertisements
Similar presentations
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Advertisements

Pipeline Exceptions & ControlCSCE430/830 Pipelining in MIPS MIPS architecture was designed to be pipelined –Simple instruction format (makes IF, ID easy)
COMP381 by M. Hamdi 1 (Recap) Pipeline Hazards. COMP381 by M. Hamdi 2 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
COMP4611 Tutorial 6 Instruction Level Parallelism
ILP: Loop UnrollingCSCE430/830 Instruction-level parallelism: Loop Unrolling CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.
EEL Advanced Pipelining and Instruction Level Parallelism Lotzi Bölöni.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Instruction-Level Parallelism (ILP)
CS 6461: Computer Architecture Instruction Level Parallelism
COMP4211 Seminar Intro to Instruction-Level Parallelism 04S1 Week 02 Oliver Diessel.
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.
Chapter 3 Instruction-Level Parallelism and Its Dynamic Exploitation – Concepts 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
Chapter 2 Instruction-Level Parallelism and Its Exploitation
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
EECC551 - Shaaban #1 Fall 2002 lec# Floating Point/Multicycle Pipelining in MIPS Completion of MIPS EX stage floating point arithmetic operations.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
EECC551 - Shaaban #1 Winter 2011 lec# Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
EECC551 - Shaaban #1 Fall 2001 lec# Floating Point/Multicycle Pipelining in DLX Completion of DLX EX stage floating point arithmetic operations.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.
Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CS203 – Advanced Computer Architecture Pipelining Review.
CS 352H: Computer Systems Architecture
Computer Architecture Principles Dr. Mike Frank
Concepts and Challenges
/ Computer Architecture and Design
CS5100 Advanced Computer Architecture Instruction-Level Parallelism
CS203 – Advanced Computer Architecture
Single Clock Datapath With Control
Pipeline Implementation (4.6)
CSCE430/830 Computer Architecture
Lecture 6: Advanced Pipelines
CSCE430/830 Computer Architecture
Adapted from the slides of Prof
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
CS203 – Advanced Computer Architecture
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
CSC3050 – Computer Architecture
Dynamic Hardware Prediction
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
CMSC 611: Advanced Computer Architecture
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Lecture 5: Pipeline Wrap-up, Static ILP
Presentation transcript:

ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine) Fall, 2006 Portions of these slides are derived from: Dave Patterson © UCB

ILP: IntroductionCSCE430/830 Class Exercise Consider the following code segment 1. LW R1, 0(R4) 2. LW R2, 0(R5) 3. ADD R3, R1, R2 4. BNZ R3, L 5. LW R4, 100(R1) 6. LW R5, 100(R2) 7. SUB R3, R4, R5 8.L: SW R3, 50(R1) Assuming that there is no forwarding, zero testing is being resolved during ID, and registers can be written in the first half of the WB cycle and also be read in the send half of the same WB cycle, Question: identify the sources of various hazards in the above code sequence.

ILP: IntroductionCSCE430/830 Class Exercise Consider the following code segment 1. LW R1, 0(R4) 2. LW R2, 0(R5) 3. ADD R3, R1, R2 4. BNZ R3, L 5. LW R4, 100(R1) 6. LW R5, 100(R2) 7. SUB R3, R4, R5 8.L: SW R3, 50(R1) Assuming that there is no forwarding, zero testing is being resolved during ID, and registers can be written in the first of the WB cycle and also be read in the send half of the same WB cycle, Question: identify the sources of various hazards in the above code sequence.

ILP: IntroductionCSCE430/830 Class Exercise Consider the following code segment 1. LW R1, 0(R4) 2. LW R2, 0(R5) 3. ADD R3, R1, R2 4. BNZ R3, L 5. LW R4, 100(R1) 6. LW R5, 100(R2) 7. SUB R3, R4, R5 8.L: SW R3, 50(R1) Assuming that there is forwarding, zero testing and target address calculation are done during ID Use compiler techniques to reorder the code (without changing the meaning/semantics of the program) so as to minimize data hazards as much as possible. Assume that no other general purpose registers other than those used in the code, are available.

ILP: IntroductionCSCE430/830 Class Exercise Consider the following code segment 1. LW R1, 0(R4) 2. LW R2, 0(R5) 3. ADD R3, R1, R2 ( Not Taken) 4. BNZ R3, L 5. LW R4, 100(R1) 6. LW R5, 100(R2) 7. SUB R3, R4, R5 8.L: SW R3, 50(R1) Assuming that there is forwarding, zero testing and target address calculation are done during ID Use compiler techniques to reorder the code (without changing the meaning/semantics of the program) so as to minimize data hazards as much as possible. Assume that no other general purpose registers other than those used in the code, are available. 1. LW R1, 0(R4) 2. LW R2, 0(R5) 3. LW R4, 100(R1) 4. ADD R3, R1, R2 5. LW R5, 100(R2) 6. BNZ R3, L 7. SUB R3, R4, R5 8.L: SW R3, 50(R1)

ILP: IntroductionCSCE430/830 Ideas To Reduce Stalls Chapter 3 Chapter 4 Pipeline CPI = Ideal pipeline CPI + Structure stalls + Data hazard stalls + Control stalls

ILP: IntroductionCSCE430/830 Forms of parallelism Process-level –How do we exploit it? What are the challenges? –Examples? Thread-level –How do we exploit it? What are the challenges? –Examples? Loop-level –What is really loop level parallelism? What percentage of a program’s time is spent inside loops? Instruction-level –Focus of Chapter 3 & 4 Coarse grain Fine Grain Human intervention?

ILP: IntroductionCSCE430/830 Instruction Level Parallelism (ILP) Principle: There are many instructions in code that don’t depend on each other. That means it’s possible to execute those instructions in parallel. This is easier said than done. Issues include: Building compilers to analyze the code, Building hardware to be even smarter than that code. This section looks at some of the problems to be solved.

ILP: IntroductionCSCE430/830 Exploiting Parallelism in Pipeline Two methods of exploiting the parallelism –Increase pipeline depth –Multiple issue »Replicate internal components »launch multiple instructions in every pipeline stage Today’s high-end microprocessor issues 3 to 8 instructions every clock cycle.

ILP: IntroductionCSCE430/830 Pipeline supports multiple outstanding FP operations M1M2M3M4M5M6M7 Mem WBIDIF A1A2A3A4 Mem WBIDIF EX Mem WBIDIF EX Mem WBIDIF MULTD ADDD LD SD

ILP: IntroductionCSCE430/830 Microarchitecture of Intel Pentium 4

ILP: IntroductionCSCE430/830 The big picture Parallelism Increase pipeline depth Multiple issue Dynamic multiple issue Static multiple issue Many decisions are made by compiler before execution Many decisions are made by hardware during execution

ILP: IntroductionCSCE430/830 ILP Challenges How many instructions can we execute in parallel? Definition of Basic instruction block: What is between two consecutive branch instructions: –Example: Body of a loop. –Typical MIPS programs have % branch instruction : »One in every 4-7 instructions is a branch. »How many of those are likely to be data dependent on each other? –We need the means to exploit parallelism across basic blocks. What stops us from doing so?

ILP: IntroductionCSCE430/830 Dependencies Data dependence Name dependencies Control Output dependence Anti-dependence Dependence

ILP: IntroductionCSCE430/830 Instr J is data dependent on Instr I Instr J tries to read operand before Instr I writes it or Instr J is data dependent on Instr K which is dependent on Instr I Caused by a “True Dependence” (compiler term) If true dependence caused a hazard in the pipeline, called a Read After Write (RAW) hazard I: add r1,r2,r3 J: sub r4,r1,r3 Data Dependence and Hazards

ILP: IntroductionCSCE430/830 Data Dependences through registers/memory Dependences through registers are easy: lw r10,10(r11) add r12,r10,r8 just compare register names. Dependences through memory are harder: sw r10,4 (r2) lw r6,0(r4) is r2+4 = r4+0? If so they are dependent, if not, they are not.

ILP: IntroductionCSCE430/830 Name dependence: when 2 instructions use the same register or memory location, called a name, but no flow of data between the instructions is associated with that name; 2 versions of name dependence Instr J writes operand before Instr I reads it Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1” If anti-dependence caused a hazard in the pipeline, called a Write After Read (WAR) hazard I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Name Dependence #1: Anti-dependence

ILP: IntroductionCSCE430/830 Name Dependence #2: Output dependence Instr J writes operand before Instr I writes it. Called an “output dependence” by compiler writers This also results from the reuse of name “r1” If anti-dependence caused a hazard in the pipeline, called a Write After Write (WAW) hazard I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7

ILP: IntroductionCSCE430/830 Dependences and hazards Dependences are a property of programs. If two instructions are data dependent they cannot execute simultaneously. Whether a dependence results in a hazard and whether that hazard actually causes a stall are properties of the pipeline organization. Data dependences may occur through registers or memory.

ILP: IntroductionCSCE430/830 Dependences and hazards The presence of the dependence indicates the potential for a hazard, but the actual hazard and the length of any stall is a property of the pipeline. A data dependence: –Indicates that there is a possibility of a hazard. –Determines the order in which results must be calculated, and –Sets an upper bound on the amount of parallelism that can be exploited.

ILP: IntroductionCSCE430/830 Instruction Dependence Example For the following code identify all data and name dependence between instructions and give the dependency graph L.D F0, 0 (R1) ADD.D F4, F0, F2 S.D F4, 0(R1) L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1)

ILP: IntroductionCSCE430/830 Instruction Dependence Example For the following code identify all data and name dependence between instructions and give the dependency graph L.D F0, 0 (R1) ADD.D F4, F0, F2 S.D F4, 0(R1) L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1) True Data Dependence: Instruction 2 depends on instruction 1 (instruction 1 result in F0 used by instruction 2), Similarly, instructions (4,5) Instruction 3 depends on instruction 2 (instruction 2 result in F4 used by instruction 3), Similarly, instructions (5,6) Name Dependence: Output Name Dependence (WAW): Instruction 1 has an output name dependence over result register (name) F0 with instructions 4 Instruction 2 has an output name dependence over result register (name) F4 with instructions 5 Anti-dependence (WAR): Instruction 2 has an anti-dependence with instruction 4 over register (name) F0 which is an operand of instruction 1 and the result of instruction 4 Instruction 3 has an anti-dependence with instruction 5 over register (name) F4 which is an operand of instruction 3 and the result of instruction 5

ILP: IntroductionCSCE430/830 Instruction Dependence Example Dependency Graph L.D F0, 0 (R1) ADD.D F4, F0, F2 S.D F4, 0(R1) L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1) L.D F0, 0 (R1) 1 ADD.D F4, F0, F2 2 S.D F4, 0(R1) 3 ADD.D F4, F0, F2 5 L.D F0, -8 (R1) 4 S.D F4, -8 (R1) 6 Can instruction 4 (second L.D) be moved just after instruction 1 (first L.D)? If not what dependencies are violated? Date Dependence: (1, 2) (2, 3) (4, 5) (5, 6) Output Dependence: (1, 4) (2, 5) Anti-dependence: (2, 4) (3, 5) Can instruction 3 (first S.D) be moved just after instruction 4 (second L.D)? How about moving 3 after 5 (the second ADD.D)? If not what dependencies are violated? Example Code

ILP: IntroductionCSCE430/830 ILP and Data Hazards HW/SW must preserve program order: the order in which instructions would execute in if executed sequentially 1 at a time as determined by original source program HW/SW goal: exploit parallelism by preserving program order only where it affects the outcome of the program Instructions involved in a name dependence can execute simultaneously if name used in instructions is changed so instructions do not conflict –Register renaming resolves name dependence for registers –Either by compiler or by HW

ILP: IntroductionCSCE430/830 Control Dependencies Every instruction is control dependent on some set of branches, and, in general, these control dependencies must be preserved to preserve program order if p1 { S1; }; if p2 { S2; } S1 is control dependent on p1, and S2 is control dependent on p2 but not on p1.

ILP: IntroductionCSCE430/830 Control Dependence Ignored Control dependence need not be preserved –willing to execute instructions that should not have been executed, thereby violating the control dependences, if can do so without affecting correctness of the program DADDU r2,r3,r4 beqz r2,l1 lw r1,0(r2) l1: Can we move lw before the branch? (Don’t worry, it is OK to violate control dependences as long as we can preserve the program semantics) Instead, 2 properties critical to program correctness are exception behavior and data flow

ILP: IntroductionCSCE430/830 Preserving the exception behavior Corollary I: Any changes in the ordering of instructions should not change how exceptions are raised in a program. → Reordering of instruction execution should not cause any new exceptions. DADDU r2,r3,r4 beqz r2,l1 lw r1,0(r2) l1:

ILP: IntroductionCSCE430/830 Preserving the data flow Consider the following example: daddu r1,r2,r3 beqz r4,L dsubu r1,r5,r6 L: … or r7,r1,r8 What can you say about the value of r1 used by the or instruction?

ILP: IntroductionCSCE430/830 Preserving the data flow Corollary II: Preserving data dependences alone is not sufficient when changing program order. We must preserve the data flow. Data flow: actual flow of data values among instructions that produce results and those that consume them. These two corollaries together allow us to execute instructions in a different order and still maintain the program semantics. This is the foundation upon which ILP processors are built.

ILP: IntroductionCSCE430/830 Speculation DADDU R1, R2, R3 BEQZ R12, skipnext DSUBU R4, R5, R6 DADDU R5, R4, R9 skipnext OR R7, R8, R9 Assume R4 is dead (rather than live) after skipnext. We can execute DSUBU before BEQZ since –R4 could not generate an exception. –The data flow cannot be affected. This type of code scheduling is called speculation. –The compiler is betting on the branch outcome. In this case, the bet is that the branch is usually not take.

ILP: IntroductionCSCE430/830 Summary Two critical properties to maintain program correctness: 1. Any changes in the ordering of instructions should not change how exceptions are raised in a program. 2. The data flow is preserved.