Processor Types And Instruction sets Chapter- 5.

Slides:

Advertisements

Similar presentations

Instruction Set Design

Advertisements

Computer Organization and Architecture

CSCI 4717/5717 Computer Architecture

ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.

1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

Computer Organization and Architecture

Computer Organization and Architecture

Computer Organization and Architecture

Computer Architecture and Data Manipulation Chapter 3.

Processor Technology and Architecture

Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.

Memory - Registers Instruction Sets

Chapter 16 Control Unit Implemntation. A Basic Computer Model.

Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.

5.2 Mathematical Power, Convenience, and Cost The set of operations represents a tradeoff among the cost of the hardware, the convenience for a programmer,

DLX Instruction Format

Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.

Appendix A Pipelining: Basic and Intermediate Concepts

From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved. 7.2 A Central Processor.

Pipelining By Toan Nguyen.

(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.

Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.

Parallelism Processing more than one instruction at a time. Pipelining

Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.

RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.

Machine Instruction Characteristics

Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

The variety Of Processors And Computational Engines CS – 355 Chapter- 4 `

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Instruction Set Architecture The portion of the machine visible to the programmer Issues: Internal storage model Addressing modes Operations Operands Encoding.

Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.

Processor Types and Instruction Sets CS 147 Presentation by Koichiro Hongo.

Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.

Introduction to Microprocessors

COMPUTER ORGANIZATION AND ASSEMBLY LANGUAGE Lecture 19 & 20 Instruction Formats PDP-8,PDP-10,PDP-11 & VAX Course Instructor: Engr. Aisha Danish.

Operand Addressing And Instruction Representation Cs355-Chapter 6.

Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.

Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.

Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.

Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.

What is a program? A sequence of steps

Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.

RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.

Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.

BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.

CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.

PipeliningPipelining Computer Architecture (Fall 2006)

Computer Organization

Central Processing Unit Architecture

A Closer Look at Instruction Set Architectures

William Stallings Computer Organization and Architecture 8th Edition

Morgan Kaufmann Publishers

Chapter 9 a Instruction Level Parallelism and Superscalar Processors

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Control unit extension for data hazards

Instruction Execution Cycle

Computer Architecture

Computer Architecture

CPU Structure CPU must:

Chapter 4 The Von Neumann Model

Presentation transcript:

Processor Types And Instruction sets Chapter- 5

Instruction set Instruction set: The set of operations a hardware recognizes or the processor can execute. Each operation is referred as an instruction. Instruction set also specifies: - allowable values - error conditions Instruction set specifies the semantics or meaning.

Mathematical Power, Convenience And Cost What operations should a processor offer? -Programmers understand that although minimum set of operations are necessary, minimum is nether convenient nor practical. ex: It is possible to compute a quotient by repeated subtraction. Choosing a set of operations that the processor will perform is a tradeoff to an architect.

Instruction set and Representation When an architect designs a processor, the architect must make two key decisions: - The set of operations the hardware recognizes. - The representation that the hardware uses for each operation.

Instruction set and Representation On each iteration of a fetch execute cycle the processor executes an instruction. Instruction set defines -how each instruction operates the values on which it acts the results the instruction produces. Instruction set also specifies: - allowable values - error conditions Instruction set specifies the semantics or meaning.

Instruction Format Instruction format: It is the binary representation that the hardware uses for instructions. It defines the boundary between hardware and software. Programs must be encoded in the same Instruction format that the processor expects . It specifies the syntactic aspects of an Instruction set. .

Opcodes, Operands And Results Instruction contains three parts: Op code- the exact operation to be performed. It is a unique number assigned to each operation when the instruction set id designed. Operand: the values used to execute an instruction. Results: one or more operands which specify the place to store the results of an instruction.

Each instruction is represented as a binary string. From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

Variable length Vs Fixed Length Instructions Should the instructions be of the same size or the length depend on the quantity and type of operands? Ex: addition operates on two values, negation operates on one. How does the processor handle multiple size operands? Should one instruction be shorter than another? Ex: A processor can have an instruction that can add a pair of 16bit integers or 32 bit integers.

Variable length Vs Fixed Length Instructions Fixed Length: Every instruction in the instruction set is same size. - Requires less complex hardware - High speed & low cost - Fetch & decode instructions without examining the opcode - Hardware is designed to ignore fields that are unneeded Variable length: instruction sizes vary depending on the requirement - Requires complex hardware to decode - Optimal use of memory, no wasted bits. - appropriate from programmer’s point of view

Variable length Vs Fixed Length Instructions Using fixed length instructions make the processor hardware less complex and faster.

Variable length Vs Fixed Length Instructions How does a processor handle cases in fixed length instructions were no operands are needed? How does the fixed length instructions accommodate both addition and negation?

Variable length Vs Fixed Length Instructions The hardware is designed to ignore fields that are not needed for a given operation. Instruction set may specify that in some instructions ,specific bits are unused. The unused fields should be viewed as hardware optimization and not as an indication of poor design.

General Purpose Registers Register: High speed hardware device - Fixed size - supports two basic operations, fetch & store Registers operate in variety of roles: -Program Counter: Holds the address of the next instruction to be executed. -General purpose register: Holds the operand when being executed or stored. A processor has a small(<100) number of registers. - Numbered 0 – N-1 - each Register can hold an integer A processor which provides 32 bit arithmetic, each register holds 32 bits. They have the same semantics as memory: fetch operation returns a value & Store operation replaces the contents of the register with new values

Floating point registers And Register Identification Floating point registers : -hold the operands for a floating point instruction. -they are also numbered starting at zero(like general purpose registers) -the instruction determines which registers are used. -the processor specifies the operands for a floating point instruction thus floating point registers will be used. Ex: if registers 3 and 6 are specified general purpose registers are used,If 3 and 6 are specified by a floating point instruction floating point registers are used.

Programming With Registers The processor moves the operands into the general purpose registers before an instruction is executed. Sometimes even the result is placed in the registers.

Programming With Registers Add 2 integers X & Y and place the result in Z 3, 6 & 7 are the available general purpose registers Load a copy of X into register 3 Load a copy of Y into register 6 Add the value in register 3 to the value in register 6, place the result in register 7 Store a copy of the value in register 7 in Z The processor moves the operands into the general purpose registers before an instruction is executed. Sometimes even the result is placed in the registers. Since the number of registers are limited a special technique is needed to make a choice about the operands present in the registers.

Programming With Registers: Register Allocation It is expensive to move values from the memory to register. A programmer retains as many values in the registers as possible. The goal is to maximize the execution speed of software programs The process of choosing, which values are held is known as Register Allocation.

Programming with Registers: Double Precision What happens when instruction generates a large result or extended value? Double precision: the Extended values (large result) are held in consecutive registers. Ex: integer multiplication If a standard integer is 32 bits wide then the double precision integer will occupy 64 bits. Ex.: if an instruction loads a double precision integer into the register 4 half of it will be in register 4 and other half in register 5.The value of register 5 can change even though the instruction contains no explicit reference.

Register Banks Registers are divided into multiple banks. Each instruction requires its operands to come from separate banks. This allow the hardware to operate faster. It is not possible to assign data values to registers or there is a register conflict. Each bank has a separate access mechanism and the mechanisms operate simultaneously hence the hardware operates faster. When a processor executes an instruction that accesses two operands in registers, both the operands can be obtained at the same time.

From Essentials of Computer Architecture by Douglas E. Comer From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

In the above example Assume operand X is from Bank A operand Y is from Bank B ( Instruction 1) operand Z is from Bank B operand X is from Bank A (Instruction 2) Examining the last instruction both the operands are from the same bank, bank B, this is a conflict. To overcome this the programmer has to either re-assign the values or give an instruction to copy the values. From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

CISC & RISC Instruction set is divided into two broad categories- Complex instruction set computer: (CISC) it includes many instructions perform complex computations each instruction requires a long time to complete Reduced instruction set computer: (RISC) it contains minimum set of instructions performs basic computations instructions are fixed size each instruction executes in one clock cycle For the RISC to process one instruction on each clock cycle hardware is pipelined. A processor is classified as CISC if the instruction set contains instructions that perform complex computations that can require long times; A processor is classified as RISC if it contains a small number of instructions that can execute in one clock cycle.

Pipelining in the processor Multiple instructions are overlapped in execution to make fast CPU’s. In a pipeline each step is called a stage, which completes a part of an instruction(e.g. fetch the instruction, perform operation etc) The instruction passes from one stage to the other. RISC processors contain parallel hardware units ,each unit performs one step The results from one hardware unit passes to the next hardware unit. There are multiple staged pipelines in modern processors.

Stages in an Instruction Pipeline The execution of a single instruction by the CPU involves the following steps: Fetch instruction and increment PC Decode instruction Access operands Execute operation Store result

The speed of the pipeline arises because all stages operate in parallel- in the above example fourth stage executes the instruction , the third stage fetches the operands for the next instruction. Once the pipeline is full one instruction completes every clock cycle. From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

THE NEED FOR PIPELINING TO MAKE FAST CPU’S. This is accomplished by increasing the CPU throughput (the number of instructions completed per unit time) Pipelining does not affect the latency time for instructions. It increases the throughput. latency: The time it takes (from beginning to end) to complete a task. throughput: The rate of task completions.

Instructions in Pipeline The pipeline designer's goal is to balance the length of each pipeline stage. If the stages are perfectly balanced then, Time per instruction on the pipelined machine = Time per instruction on non-pipelined machine Number of pipe stages

WHAT ARE PIPELINE HAZARDS ??? Situations that prevent the next instruction in the instruction stream from executing during its designated clock cycle. They reduce the performance. We also call them as bubbles or hiccups in the pipeline. pipeline interlock is a mechanism to detect hazard and resolve it

From Essentials of Computer Architecture by Douglas E. Comer From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

From Essentials of Computer Architecture by Douglas E. Comer From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

Other Causes of pipeline Stalls In addition to waiting for operands pipeline can stall when - accesses external storage - Invokes a co-processor - branches to a new location - calls a subroutine Additional hardware (multiple pipelines) are added to avoid stalls.

From Essentials of Computer Architecture by Douglas E. Comer From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

No-Op instructions Programmers document instruction stalls. - insert a comment to explain the reason of a stall - insert extra instructions called No-Op instructions. No-Op Instructions: - they are the values which do nothing , do not affect the values in the register or the program. -These instructions occupy time only Most of the processors include a no-op instruction that does not reference data values, compute a result or otherwise affect the state of the computer. A programmer can insert no-op instructions to document an instruction stall.

ALU uses a technique known as forwarding to solve the problems of successive arithmetic instructions passing results. From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

Forwarding The problem with stalls can be solved with a simple hardware technique called forwarding. Stalls can be avoided , If the result can be moved from where it is produced to the place where it is needed. The ALU result from the stage4(inst. k) is always fed back to the ALU input latches, stage3(inst.k+1). Inst k+1 has halted at stage 3 because the operands required are not ready , they are in the stage 4 with instruction K (execution stage)

Types of Operations Arithmetic instructions(add and subtract) Logic instructions (and, or, and not) Data access and transfer instructions(move, input, output, load, and store) Floating point instruction Process Control instructions(goto, if ... goto, call, and return)

Program Counter, Fetch-Execute and Branching The processor uses a special purpose internal register called program counter(PC) to implement the fetch-execute cycle. The PC contains the address of the next instruction to be executed. Branch instructions: Absolute, relative - Absolute: specifies the address of the next instruction to be executed. Ex: jump oxo5DE ( next inst. Will be fetched at this location) - Relative : specifies a positive or negative increment for program counter, Ex: br +8 (branches to 8 bytes beyond the current value) A relative branch instruction does not specify the exact memory address. To implement relative branching a processor adds the operand in the branch instruction to the program counter, and places the result in the internal address register A.

Instruction for Subroutine Jsr: this is the instruction to call a subroutine. General format of calling and returning from a subroutine: Before the branch, the address of the next inst. is saved. Difference between subroutine instruction and branch instruction is that the subroutine instruction saves the value of the address register A. When subroutine finishes executing it returns to the caller.

Subroutine Calls, Arguments The calling program sends the arguments used by the subroutine. Some architectures use memory and others use registers to store the arguments before the call. Using the special purpose or general purpose registers to pass arguments is faster than using memory. General purpose registers are used to hold temporary values during computations. Using a general purpose register to hold the arguments or hold data during other operations is a tradeoff. Tradeoff: Using a general purpose registers for passing arguments increases the speed for subroutine call, but using it to store the data values increases the speed of general computation. Thus a programmer must choose which arguments to keep in the registers and how many to store in memory.

Register Window Different parts of the program compete for the use of the registers. Optimization for argument passing included in processors is called REGISTER WINDOW. The window exposes only a subset of registers to the program. The window moves as the subroutine is invoked and then moves back when it returns. The windows available to the subroutine and programs overlap. Some of the registers visible to the caller are visible to the subroutine RISC design, only eight registers were visible to the programs, out of a total of 64.

Xi – calling program - li -subroutine >Registers numbered 0-7 in the window. >Calling program places arguments in A to D in registers 4 -7. >subroutine finds it arguments in registers 0 - 3 Xi – calling program - li -subroutine From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

MIPS Instruction Set Register windows enhance the performance but require additional hardware. A careful compiler could find free registers without requiring more hardware. The group at Stanford University designing the MIPS Architecture. The early MIPS architectures were 32-bit implementations (generally 32-bit wide registers and data paths), while later versions were 64-bit implementations Most instructions require the operands and results to be in the registers. The instruction set is considerably smaller than the Intel instruction sets MIPS- Minimalistic Instruction Set. MIPS instruction set is a classic example of RISC processor. It is used popularly in Embedded systems.

MIPS Instruction Set MIPS architecture in RISC processor has three-address instruction set. Arithmetic and logical instructions have 3 operands. Example: add $8, $9, $10. Earlier Intel processors have 2-address instruction set . MIPS is a load and store architecture. Supports two principles Speed & minimalism. Speed is ensured because the processor is designed to finish one instruction every clock cycle. Minimalistic - it contains fewer instructions, use fewer bits to encode Register 0 is one of the feature used to achieve minimalism Ex: to copy a value from one register to another , ADD instruction can be used MIPS does not include an instruction to copy contents from one register to another nor does it include an instruction to add a value in memory to the contents in register instead it uses its Register 0. ex: to copy a value from one register to another an ADD instruction can be used in which one of the two operands is a zero register.

MIPS Instruction Set There are 32 ,32-bit registers (for non-floating point operands). There are also 16, 64-bit registers used for floating point operands. Floating point registers operate in pairs to handle double precision values. Even numbered register is specified as a target for the floating point instruction. Register 0 is one of the feature used to achieve minimalism Ex: to copy a value from one register to another , ADD instruction can be used Principle of orthognality. Principle of orthognality: eliminates unnecessary duplication and overlap among instructions. An Instruction is said to be orthognal when each instruction performs a unique function.

From Essentials of Computer Architecture by Douglas E. Comer From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

From Essentials of Computer Architecture by Douglas E. Comer From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

From Essentials of Computer Architecture by Douglas E. Comer From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

Key Points Instruction set consists of the operations the processor supports. CICS & RISC processors Pipelining issues Optimization techniques by using a Register window. MIPS Instruction set.