Presentation is loading. Please wait.

Presentation is loading. Please wait.

AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt.

Similar presentations


Presentation on theme: "AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt."— Presentation transcript:

1 AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt 17, 2011

2 OutlineOutline  Pipeline  RISC vs RISC  Superscalar

3 INSTRUCTION PIPELINE

4 PipelinePipeline Problem with single cycle designProblem with single cycle design –Slowest instruction pulls down the clock frequency –Resource utilization is poor –There are some instructions which are impossible to be implemented in this manner Organizationally needs a change  pipelineOrganizationally needs a change  pipeline

5 PipelinePipeline Similar to the use of an assembly line in a manufacturing plantSimilar to the use of an assembly line in a manufacturing plant –Products at various stages can be worked simultaneously

6 Two-Stage Instruction Pipeline Any problem ? The ‘fetch’ has to wait if : 1.T(exec) >T(fetch) ! 2.There is a branch instruction

7 Six-Stage Instruction Pipeline Decomposing the instruction processing into : Fetch Instruction (FI)Fetch Instruction (FI) Decode Instruction (DI)Decode Instruction (DI) Calculate Operands (CO)Calculate Operands (CO) Fetch Operands (FO)Fetch Operands (FO) Execute Instruction (EI)Execute Instruction (EI) Write Operands (WO)Write Operands (WO)

8 Six-Stage Instruction Pipeline Assumes that :Assumes that : –no memory conflicts –no branches –no interrupts

9 Six-Stage Instruction Pipeline With branches : Penalty : no instructions complete during time units 9 - 12 Penalty : no instructions complete during time units 9 - 12

10 Six-Stage Instruction Pipeline Modified algorithm :Modified algorithm :

11 Pipeline Performance The cycle time of an instruction pipeline :The cycle time of an instruction pipeline :

12 Pipeline Performance Let T[k,n] be the total time required for a pipeline with k stages to execute n instructions (total execution time) :Let T[k,n] be the total time required for a pipeline with k stages to execute n instructions (total execution time) : Pipeline speedup :Pipeline speedup :

13 Speedup Factors with Instruction Pipeline

14 Pipeline Hazards Occurs when the pipeline, or some portion of the pipeline, must stall/idle because conditions do not permit continued executionOccurs when the pipeline, or some portion of the pipeline, must stall/idle because conditions do not permit continued execution 3 types of hazards :3 types of hazards : 1.Resource Hazards 2.Data Hazards 3.Control Hazards

15 Resource Hazards occurs when two (or more) instructions that are already in the pipeline need the same resourceoccurs when two (or more) instructions that are already in the pipeline need the same resource Sometime referred as a structural hazardSometime referred as a structural hazard

16 Data Hazards occurs when there is a conflict in the access of an operand locationoccurs when there is a conflict in the access of an operand location –ADD EAX, EBX /* EAX = EAX + EBX */ –SUB ECX, EAX /* ECX = ECX - EAX */ 3 types of data hazards :3 types of data hazards : –Read after write (RAW) –Write after read (WAR) –Write after write (WAW)

17 Control Hazards knows as a branch hazardknows as a branch hazard occurs when the pipeline makes the wrong decision on a branch prediction and therefore brings instructions into the pipeline that must subsequently be discardedoccurs when the pipeline makes the wrong decision on a branch prediction and therefore brings instructions into the pipeline that must subsequently be discarded

18 CISC vs RISC

19 What is CISC? CISC is an acronym for Complex Instruction Set Computer and are chips that are easy to program and which make efficient use of memory. Since the earliest machines were programmed in assembly language and memory was slow and expensive, the CISC philosophy made sense, and was commonly implemented in such large computers as the PDP-11 and the DECsystem 10 and 20 machines.CISC is an acronym for Complex Instruction Set Computer and are chips that are easy to program and which make efficient use of memory. Since the earliest machines were programmed in assembly language and memory was slow and expensive, the CISC philosophy made sense, and was commonly implemented in such large computers as the PDP-11 and the DECsystem 10 and 20 machines. Most common microprocessor designs such as the Intel 80x86 and Motorola 68K series followed the CISC philosophy.Most common microprocessor designs such as the Intel 80x86 and Motorola 68K series followed the CISC philosophy. CISC was developed to make compiler development simpler. It shifts most of the burden of generating machine instructions to the processor. For example, instead of having to make a compiler write long machine instructions to calculate a square-root, a CISC processor would have a built-in ability to do this.CISC was developed to make compiler development simpler. It shifts most of the burden of generating machine instructions to the processor. For example, instead of having to make a compiler write long machine instructions to calculate a square-root, a CISC processor would have a built-in ability to do this.

20 CISC Attributes The design constraints that led to the development of CISC (small amounts of slow memory and fact that most early machines were programmed in assembly language) give CISC instructions sets some common characteristics: A 2-operand format, where instructions have a source and a destination. Register to register, register to memory, and memory to register commands. Multiple addressing modes for memory, including specialized modes for indexing through arraysA 2-operand format, where instructions have a source and a destination. Register to register, register to memory, and memory to register commands. Multiple addressing modes for memory, including specialized modes for indexing through arrays Variable length instructions where the length often varies according to the addressing modeVariable length instructions where the length often varies according to the addressing mode Instructions which require multiple clock cycles to execute.Instructions which require multiple clock cycles to execute. E.g. Pentium is considered a modern CISC processor

21 Most CISC hardware architectures have several characteristics in common: Complex instruction-decoding logic, driven by the need for a single instruction to support multiple addressing modes.Complex instruction-decoding logic, driven by the need for a single instruction to support multiple addressing modes. A small number of general purpose registers. This is the direct result of having instructions which can operate directly on memory and the limited amount of chip space not dedicated to instruction decoding, execution, and microcode storage.A small number of general purpose registers. This is the direct result of having instructions which can operate directly on memory and the limited amount of chip space not dedicated to instruction decoding, execution, and microcode storage. Several special purpose registers. Many CTSC designs set aside special registers for the stack pointer, interrupt handling, and so on. This can simplify the hardware design somewhat, at the expense of making the instruction set more complex.Several special purpose registers. Many CTSC designs set aside special registers for the stack pointer, interrupt handling, and so on. This can simplify the hardware design somewhat, at the expense of making the instruction set more complex. CISC Hw. Architecture

22 At the time of their initial development, CISC machines used available technologies to optimize computer performance. Microprogramming is as easy as assembly language to implement, and much less expensive than hardwiring a control unit.Microprogramming is as easy as assembly language to implement, and much less expensive than hardwiring a control unit. The ease of microcoding new instructions allowed designers to make CISC machines upwardly compatible: a new computer could run the same programs as earlier computers because the new computer would contain a superset of the instructions of the earlier computers.The ease of microcoding new instructions allowed designers to make CISC machines upwardly compatible: a new computer could run the same programs as earlier computers because the new computer would contain a superset of the instructions of the earlier computers. As each instruction became more capable, fewer instructions could be used to implement a given task. This made more efficient use of the relatively slow main memory.As each instruction became more capable, fewer instructions could be used to implement a given task. This made more efficient use of the relatively slow main memory. Because micro-program instruction sets can be written to match the constructs of high-level languages, the compiler does not have to be as complicated.Because micro-program instruction sets can be written to match the constructs of high-level languages, the compiler does not have to be as complicated.

23 CISC Disadvantages Designers soon realized that the CISC philosophy had its own problems, including: Earlier generations of a processor family generally were contained as a subset in every new version - so instruction set & chip hardware become more complex with each generation of computers.Earlier generations of a processor family generally were contained as a subset in every new version - so instruction set & chip hardware become more complex with each generation of computers. So that as many instructions as possible could be stored in memory with the least possible wasted space, individual instructions could be of almost any length - this means that different instructions will take different amounts of clock time to execute, slowing down the overall performance of the machine.So that as many instructions as possible could be stored in memory with the least possible wasted space, individual instructions could be of almost any length - this means that different instructions will take different amounts of clock time to execute, slowing down the overall performance of the machine. Many specialized instructions aren't used frequently enough to justify their existence -approximately 20% of the available instructions are used in a typical program.Many specialized instructions aren't used frequently enough to justify their existence -approximately 20% of the available instructions are used in a typical program. CISC instructions typically set the condition codes as a side effect of the instruction. Not only does setting the condition codes take time, but programmers have to remember to examine the condition code bits before a subsequent instruction changes them.CISC instructions typically set the condition codes as a side effect of the instruction. Not only does setting the condition codes take time, but programmers have to remember to examine the condition code bits before a subsequent instruction changes them.

24 What is RISC? RISC? RISC, or Reduced Instruction Set Computer. is a type of microprocessor architecture that utilizes a small, highly- optimized set of instructions, rather than a more specialized set of instructions often found in other types of architectures.RISC? RISC, or Reduced Instruction Set Computer. is a type of microprocessor architecture that utilizes a small, highly- optimized set of instructions, rather than a more specialized set of instructions often found in other types of architectures. History The first RISC projects came from IBM, Stanford, and UC- Berkeley in the late 70s and early 80s. The IBM 801, Stanford MIPS, and Berkeley RISC 1 and 2 were all designed with a similar philosophy which has become known as RISC. Certain design features have been characteristic of most RISC processors:History The first RISC projects came from IBM, Stanford, and UC- Berkeley in the late 70s and early 80s. The IBM 801, Stanford MIPS, and Berkeley RISC 1 and 2 were all designed with a similar philosophy which has become known as RISC. Certain design features have been characteristic of most RISC processors: –one cycle execution time: RISC processors have a CPI (clock per instruction) of one cycle. This is due to the optimization of each instruction on the CPU and a technique called PIPELINING –pipelining: a technique that allows for simultaneous execution of parts, or stages, of instructions to more efficiently process instructions; –large number of registers: the RISC design philosophy generally incorporates a larger number of registers to prevent in large amounts of interactions with memory

25 RISC Attributes The main characteristics of CISC microprocessors are: Extensive instructions.Extensive instructions. Complex and efficient machine instructions.Complex and efficient machine instructions. Micro-encoding of the machine instructions.Micro-encoding of the machine instructions. Extensive addressing capabilities for memory operations.Extensive addressing capabilities for memory operations. Relatively few registers.Relatively few registers. In comparison, RISC processors are more or less the opposite of the above: Reduced instruction set.Reduced instruction set. Less complex, simple instructions.Less complex, simple instructions. Hardwired control unit and machine instructions.Hardwired control unit and machine instructions. Few addressing schemes for memory operands with only two basic instructions, LOAD and STOREFew addressing schemes for memory operands with only two basic instructions, LOAD and STORE Many symmetric registers which are organized into a register file.Many symmetric registers which are organized into a register file.

26 RISC Disadvantages There is still considerable controversy among experts about the ultimate value of RISC architectures. Its proponents argue that RISC machines are both cheaper and faster, and are therefore the machines of the future.There is still considerable controversy among experts about the ultimate value of RISC architectures. Its proponents argue that RISC machines are both cheaper and faster, and are therefore the machines of the future. However, by making the hardware simpler, RISC architectures put a greater burden on the software. Is this worth the trouble because conventional microprocessors are becoming increasingly fast and cheap anyway?However, by making the hardware simpler, RISC architectures put a greater burden on the software. Is this worth the trouble because conventional microprocessors are becoming increasingly fast and cheap anyway?

27 CISC versus RISC CISCRISC Emphasis on hardwareEmphasis on software Includes multi-clock complex instructions Single-clock, reduced instruction only Memory-to-memory: "LOAD" and "STORE" incorporated in instructions Register to register: "LOAD" and "STORE" are independent instructions Small code sizes, high cycles per second Low cycles per second, large code sizes Transistors used for storing complex instructions Spends more transistors on memory registers

28 SummationSummation As memory speed increased, and high-level languages displaced assembly language, the major reasons for CISC began to disappear, and computer designers began to look at ways computer performance could be optimized beyond just making faster hardware.As memory speed increased, and high-level languages displaced assembly language, the major reasons for CISC began to disappear, and computer designers began to look at ways computer performance could be optimized beyond just making faster hardware. One of their key realizations was that a sequence of simple instructions produces the same results as a sequence of complex instructions, but can be implemented with a simpler (and faster) hardware design. (Assuming that memory can keep up.) RISC (Reduced Instruction Set Computers) processors were the result.One of their key realizations was that a sequence of simple instructions produces the same results as a sequence of complex instructions, but can be implemented with a simpler (and faster) hardware design. (Assuming that memory can keep up.) RISC (Reduced Instruction Set Computers) processors were the result. CISC and RISC implementations are becoming more and more alike. Many of today's RISC chips support as many instructions as yesterday's CISC chips. And today's CISC chips use many techniques formerly associated with RISC chips.CISC and RISC implementations are becoming more and more alike. Many of today's RISC chips support as many instructions as yesterday's CISC chips. And today's CISC chips use many techniques formerly associated with RISC chips.

29 Modern Day Advancement CISC and RISC Convergence State of the art processor technology has changed significantly since RISC chips were first introduced in the early '80s. Because a number of advancements are used by both RISC and CISC processors, the lines between the two architectures have begun to blur. In fact, the two architectures almost seem to have adopted the strategies of the other. Because processor speeds have increased, CISC chips are now able to execute more than one instruction within a single clock. This also allows CISC chips to make use of pipelining. With other technological improvements, it is now possible to fit many more transistors on a single chip.CISC and RISC Convergence State of the art processor technology has changed significantly since RISC chips were first introduced in the early '80s. Because a number of advancements are used by both RISC and CISC processors, the lines between the two architectures have begun to blur. In fact, the two architectures almost seem to have adopted the strategies of the other. Because processor speeds have increased, CISC chips are now able to execute more than one instruction within a single clock. This also allows CISC chips to make use of pipelining. With other technological improvements, it is now possible to fit many more transistors on a single chip.

30 This gives RISC processors enough space to incorporate more complicated, CISC-like commands. RISC chips also make use of more complicated hardware, making use of extra function units for superscalar execution. All of these factors have led some groups to argue that we are now in a "post-RISC" era, in which the two styles have become so similar that distinguishing between them is no longer relevant. However, it should be noted that RISC chips still retain some important traits. RISC chips strictly utilize uniform, single-cycle instructions. They also retain the register-to-register, load/store architecture. And despite their extended instruction sets, RISC chips still have a large number of general purpose registers.This gives RISC processors enough space to incorporate more complicated, CISC-like commands. RISC chips also make use of more complicated hardware, making use of extra function units for superscalar execution. All of these factors have led some groups to argue that we are now in a "post-RISC" era, in which the two styles have become so similar that distinguishing between them is no longer relevant. However, it should be noted that RISC chips still retain some important traits. RISC chips strictly utilize uniform, single-cycle instructions. They also retain the register-to-register, load/store architecture. And despite their extended instruction sets, RISC chips still have a large number of general purpose registers. Modern Day Advancement

31 SUPERSCALAR

32 What is Superscalar ? Refers to a machine that is designed to improve the execution performance of scalar instructionsRefers to a machine that is designed to improve the execution performance of scalar instructions A superscalar processor is one in which multiple independent instruction pipelines are used, exploits what is knows as instruction-level parallelismA superscalar processor is one in which multiple independent instruction pipelines are used, exploits what is knows as instruction-level parallelism Equally applicable to RISC & CISCEqually applicable to RISC & CISC In practice usually RISCIn practice usually RISC

33 General Superscalar Organization

34 Fetching Two Instructions per Cycle

35 SuperpipelinedSuperpipelined Many pipeline stages need less than half a clock cycleMany pipeline stages need less than half a clock cycle Double internal clock speed gets two tasks per external clock cycleDouble internal clock speed gets two tasks per external clock cycle Superscalar allows parallel fetch executeSuperscalar allows parallel fetch execute

36 Superscalar vs Superpipelined

37 LimitationsLimitations Instruction level parallelismInstruction level parallelism Compiler based optimisationCompiler based optimisation Hardware techniquesHardware techniques Limited byLimited by –True data dependency –Procedural dependency –Resource conflicts –Output dependency –Antidependency

38 True Data Dependency ADD r1, r2 (r1 := r1+r2;)ADD r1, r2 (r1 := r1+r2;) MOVE r3,r1 (r3 := r1;)MOVE r3,r1 (r3 := r1;) Can fetch and decode second instruction in parallel with firstCan fetch and decode second instruction in parallel with first Can NOT execute second instruction until first is finishedCan NOT execute second instruction until first is finished

39 Procedural Dependency Can not execute instructions after a branch, in parallel with, instructions before a branchCan not execute instructions after a branch, in parallel with, instructions before a branch Also, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are neededAlso, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are needed This prevents simultaneous fetchesThis prevents simultaneous fetches

40 Resource Conflict Two or more instructions requiring access to the same resource at the same timeTwo or more instructions requiring access to the same resource at the same time –e.g. two arithmetic instructions Can duplicate resourcesCan duplicate resources –e.g. have two arithmetic units

41 DependenciesDependencies

42 AntidependencyAntidependency WAW dependencyWAW dependency –R3:=R3 + R5; (I1) –R4:=R3 + 1; (I2) –R3:=R5 + 1; (I3) –R7:=R3 + R4; (I4) I3 can not complete before I2 starts as I2 needs a value in R3 and I3 changes R3

43 Register Renaming Antidependencies occur because register contents may not reflect the correct ordering from the programAntidependencies occur because register contents may not reflect the correct ordering from the program May result in a pipeline stallMay result in a pipeline stall Registers allocated dynamicallyRegisters allocated dynamically –i.e. registers are not specifically named

44 Register Renaming example R3b:=R3a + R5a (I1)R3b:=R3a + R5a (I1) R4b:=R3b + 1 (I2)R4b:=R3b + 1 (I2) R3c:=R5a + 1 (I3)R3c:=R5a + 1 (I3) R7b:=R3c + R4b (I4)R7b:=R3c + R4b (I4) Without subscript refers to logical register in instructionWithout subscript refers to logical register in instruction With subscript is hardware register allocatedWith subscript is hardware register allocated Note R3a R3b R3cNote R3a R3b R3c Disadvantage: need more registers !Disadvantage: need more registers !

45 Superscalar Execution

46 Superscalar Execution Example - With Register Renaming for WAR and WAW dependencies.

47 ConclusionConclusion It thereby allows faster CPU than would otherwise be possible at the same clock rate. It thereby allows faster CPU throughput than would otherwise be possible at the same clock rate. All general-purpose CPUs developed since about 1998 are superscalar. All general-purpose CPUs developed since about 1998 are superscalar. The major problem of executing multiple instructions in a scalar program is the handling of data dependencies. If data dependencies are not effectively handled, it is difficult to achieve an execution rate of more than one instruction per clock cycle. The major problem of executing multiple instructions in a scalar program is the handling of data dependencies. If data dependencies are not effectively handled, it is difficult to achieve an execution rate of more than one instruction per clock cycle.

48 Comparison of processors

49 Any Question ?

50 THANK YOU


Download ppt "AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt."

Similar presentations


Ads by Google