Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 1 Lecture 4 Instruction Set Architecture.

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 1 Lecture 4 Instruction Set Architecture

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 2 Instruction Set Architecture 1950s to 1960s: Computer Architecture Course Computer Arithmetic 1970 to mid 1980s: Computer Architecture Course Instruction Set Design, especially ISA appropriate for compilers 1990s: Computer Architecture Course Design of CPU, memory system, I/O system, Multiprocessors

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 3 Languages of Computers Machine Language –Programs consist of machine instructions –Directly executable without preprocessing –Direct manipulation of machine registers –Efficient in view of machine resource utilization –Difficult to program Assembly language –Improved version of machine language with emphasis on user-friendliness Symbolic machine language(symbols for operations and addresses) –Assembler is needed to translate into a machine language program High-Level Language –Programs consist of statements, each of which can be translated into several machine language instructions –Need a compiler to translate into a machine language program –Relatively easy to program compare to ML or AL –Hardware resource utilization may be inefficient

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 4 Semantic Gap Between ML and HLL As Hardware cost goes down, Software cost goes up System cost year HW SW – Shortage of programmers – Unreliable Software => Unreliable Computers Response: Keep the programming cost down – Develop powerful, complex user-friendly HLL – HLL programmers are easy to train Greater Semantic Gap between HLL and Machine Language – Execution inefficiency – Software complexity – Compiler complexity To offset the semantic gap – Large instruction set – Variety of addressing modes – Hardware/Firmware implementation of HLL primitives

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 5 Instruction Set Boundary between Designers(architects) and programmers –For designers:Specification of the function of CPU –For Programmers:A pool of functions from which they choose to use in the program One would expect that human language should directly reflect the characteristics of human intellectual capabilities that language should be a direct mirror of mind in ways which other systems of knowledge and belief cannot. - Noam Chomsky Instruction Set –Language of a machine –Characterizes the machine’s capability and behavior Performance Issues –Memory Bandwidth is used 1/2 for Instructions and 1/2 for Data –For efficient utilization of MB, instruction representation must as compact as possible whilst still being compatible with data –von Neumann Bottleneck exists in MB

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 6 Memory Bandwidth Issue Memory Bandwidth is used by CPU and I/O IF OF IF OF IF OF D/IP E D/IP E D/IP E Memory bandwidth given to CPU Memory Bandwidth given to CPU is used for Instruction Fetches and Operand Fetches or Operand Stores Consider an AC-machine; ADD X, or LDA X Memory Bandwidth instruction execution instruction execution I/O

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 7 Machine Language –Vocabulary Operations Addressing Modes for operands’ addresses and the next instruction address –Syntax Methods of representing operation(OP-code), operands, addresses in an instruction –Instruction format –Encoding of Instruction fields –Grammar Rules of using instructions to make a program

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 8 Components of an Instruction Operation Code(OP-code) –Format specifier Long / Short Field definition –Operation –Types of operands Operand Address(es) –Operand itself –Address themselves(including abbreviated) –Address modification specification Automatic indexing Relative address –Sequencing

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 9 Instruction Set and Computer Architecture Computer Architectures are classified into three classes according to the Register Structures for operands storage ALU General Purpose Registers Stack Stack Architecture Input Bus Output Bus Registers ALU GPR Architecture AC Architecture Output Bus Input Bus Other Registers AC ALU – AC Computer Architecture – Stack Computer Architecture – General Purpose Register Computer Architecture

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 10 Stack Computer Architecture InstructionOperation ALU SP Full(F) Empty(E) S n-1 0 PUSH X if F=1, then S overflow; else SP SP+1, S[SP] M[X], POP X if E=1, then empty S; else M[X] S[SP], SP SP-1, F 0, if SP=(n-1), E 1 if E=1, then empty S; else ALU S[SP], then S[SP] ALU if E=1, then empty S; else ALU S[SP], SP SP-1, if SP=(n-1), then E 1, empty S; else ALU S[SP], then S[SP] ALU if SP=(n-1), then F 1, E 0 (Shift Left) (ADD) Unary Instr. Binary Instr.

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 11 Characteristics of the Stack Architecture Instruction length is short –No need to represent the address(es) of operand(s) in functional instructions Instruction execution time is fast –Operand(s) access is fast because they are in the stack(register) Operand(s) must be stored in the stack before operating on them –Inconvenient to prepare data in the stack –Frequent use of PUSH and POP instructions to prepare data in the stack - memory access

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 12 AC Computer Architecture Output Bus Input Bus Other Registers AC ALU Characteristics: - Instruction execution time of binary instructions are slow » One of the operands must be read from memory - Instruction length is longer than in the stack architecture » One of the operand’s memory address must be specified in the instruction although AC(a data register) can be implied - Frequency of LDA/STA instructions is high » There is only one data register Instruction Operation Unary Instruction AC f(AC) (CPA) Binary Instruction AC f(AC, M[X]) (ADD X) (LDA X) AC M[X] (STA X) M[X] AC Transfer Instruction

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 13 GPR Computer Architecture Input Bus Output Bus Registers ALU InstructionOperation Characteristics: - Instruction length is short because register addresses are used for operands - Instruction execution time is fast because all the operands are in the registers - Frequency of using LD/ST instructions depends on the number of registers - Opportunities of storing the results of operations in GPR is high because there are many registers Unary Instruction (COMPR1, R2) R1 f(R2) Binary Instruction (ADD R1, R2) or R1 f(R1, R2) or (ADD R1, R2, R3) R3 f(R1, R2) Transfer Instruction (LD R1, X) R1 M[X] (ST R1, X) M[X] R1

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 14 Computer Architecture?... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation. Amdahl, Blaaw, and Brooks, 1964 SOFTWARE

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 15 Towards Evaluation of ISA and Organization instruction set software hardware

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 16 Interface Design A Good Interface: Lasts through many implementations (portability, compatibility) Is used in many different ways (generality) Provides convenient functionality to higher levels Permits an efficient implementation at lower levels Interface Interface imp 1 imp 2 imp 3 use time

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 17 Evolution of Instruction Sets Single Accumulator (EDSAC 1950) Accumulator + Index Registers(Manchester Mark I, IBM 700 series 1953) Separation Instruction set from Implementation High-level Language Based (B5000 1963) Concept of a Family (IBM /S360 1964) General Purpose Register Machines Complex Instruction Sets (Vax, Intel 432 1977-80) Load/Store Architecture (CDC 6600, Cray 1 1963-76) RISC (Mips,Sparc,88000,IBM RS6000,...1987)

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 18 Evolution of Instruction Sets Major advances in computer architecture are typically associated with landmark instruction set designs –Ex: Stack(B1700) vs GPR (System S/360) Design decisions must take into account: –technology(component) –machine organization –programming languages –compiler technology –operating systems And they in turn influence these

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 19 Design Space of ISA Five Primary Dimensions Number of explicit operands( 0, 1, 2, 3 ) Operand StorageWhere besides memory? Effective AddressHow is memory location specified? Type and Size of Operandsbyte, int, float, vector,... How is it specified? Operationsadd, sub, mul,... How is it specified? Other Aspects SuccessorHow is it specified? ConditionsHow are they determined? EncodingFixed or variable? Wide? Parallelism

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 20 Number of Explicit Operands To optimize the memory bandwidth required by instructions(for fetching from Memory), the number of explicitly specified operands in the instruction needs to be reduced –2 operands(GPR machine) 2 source operands(1 of the source operands is destroyed after execution to store the result) –1 operand(AC machine) 1 of the operands is implied to a specific hardware register called Accumulator(AC)(result of the execution is also stored in this register) –0 operand(Stack machine) Both of the operands and the result are implied to a stack Maximum number of operands to be specified is 3 - 2 source operands and 1 result operand

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 21 Operand Storage Storage Memory - Long memory addressing - Need to represent the address with a few bits »Relative addressing with displacement »Page/Segment addressing Register - General purpose register »Short register addressing - AC Stack(register) - Does not need for addresses

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 22 Address Space and Storage Space Address Space –Consists of addresses that programmers can use Storage Space –Consists of physical storage locations For a simple low cost machine, the Address Space and the Storage Space are identical –Programmers program with the actual storage addresses Modern computers provide the storage systems with Independent Address and Storage Spaces –An Effective Address(EA) needs to be obtained from the Address used in the program to access the operand from the memory –Usually the Address Space is much larger than the Storage Space –Virtual Storage System

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 23 Effective Address – Address and Physical Storage Location are two different concepts. – Addresses of Operands are represented or implied in the instruction. – Operand’s address needs to be mapped into an Effective Address of the physical storage location Basic Addressing Modes(A or R in instructions) ModeAlgorithmAdvantageDisadvantage Immediateopd=A# of M refer limited value DirectEA=Asimple limited addr space IndirectEA=M[A]large addr space multiple M refer RegisterEA=Rno M refer limited addr space R IndirectEA= M[R]large addr space extra M refer DisplacementEA= A+[R]flexibility complexity Stackopd=S[TOP]no M refer limited applications

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 24 Specification of Type and Size of Operand Specification of the Type of the operand –Usually different op-codes for different types of operands Specification of the Size of the operand –op-code represents the resolution of the operand address bit, byte, half word(upper/lower half), word,... –Length of operands Implicit Variable length :Specified explicitly in the instruction :Specified by a designated register :Specified by the delimiter marks in the operand reserved-bit delimiter(field or word mark) reserved-bit configuration(record or group mark)

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 25

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 26 Operation Specification –Encoded to reduce the instruction length reason Types –Minimal Instruction Set –Complex Instruction Set vs RISC

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 27 Four Types of Operations Functional ADD, AND, CPA, CPC, ROL, CLA, CLC, INC, … Transfer LDA, STA(LD, ST), … Control JMP, JNA, JZA, JZC(SMA, SZA, SZC), … Input/Output INP, OUT, …

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 28 Minimal Instruction Set Bn instruction Bn X1,X2,X3 M[X1] M[X1] - M[X2] AC M[X1] - M[X2] If AC < 0 PC X3 PC X1 X2 M[X1] M M[X2] PC+1 Temp ALU(f) AC PC Move the content of Source to Destination A 2-address instruction(X1, X2) A M[PC], PC PC+1 TempM[A] B M[PC], PC PC+1 M[B], AC SUB(Temp, M[B]) Memory mapped ALU

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 29 Why NOT Use a Minimal Instruction Set? ADD X, Y BNa1, a1, a3/M[a1]0 BNa1, Y, a3/M[a1]- M[Y] a3:BNX, a1, a3/M[X]M[X] + M[Y] JMP X BN a1, a1, a3/M[a1] 0 BNa1, 1, X/AC-1, PC X Inefficient Program Size(M bandwidth) - Large IC and CPI Programming difficulty

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 30 Trade-off 3 Es(Elegance, Efficiency, Environment) Elegance –Completeness(Even Bn instruction is complete) –Symmetry:AC <= f(AC, M[X]) and M[X] <= f(AC, M[X]) –Flexibility, Generality Efficiency –Space Bit budget –Efficient specification of address –Fewer instructions require fewer bits to encode OP-code –Frequency of use arguments –Bandwidth arguments(NOP simply waste memory bandwidth) –Ratio of overheads: non-functional to functional Environment –Multiprogramming(Relocation, Protection, Sharing) –Code generation by compilers(Compiler favors only a little portion of instruction set) Instruction Set Design: Operations to Include in the Instruction Set

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 31 ISA Metrics Aesthetics: Orthogonal –No special registers, few special cases, all operand modes available with any data type or instruction type Completeness –Support for a wide range of operations and target applications Regularity –No overloading for the meanings of instruction fields Streamlined –Resource needs easily determined Ease of compilation (programming?) Ease of implementation Scalability

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 32 Powerful Instruction Rich, Powerful Instruction: Instruction with longer Execution Time(E) to balance the overhead penalty(O) Instruction which has a large E/O Overhead for Execution(O) Conventional Instruction Cycle I-F I-P O-F E Instruction Fetch, Decode, Opd addr decision, and fetch Execution (Operation) (E) IF IP OF E OE

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 33 Powerful Instructions Extended Arithmetic Function –Multiply, divide, Trigonometric Functions, etc Automatic Indexing –BCT R1, addr(R1 <- R1 - 1, if R1 = 0 then PC <- addr) –BXLE R1, R3, addr (R1 <- R1 + R3, if R3=odd, R1 < R3, PC <- addr if R3=even, R1 < R3+1, PC <- addr) Subroutine Linkage –JMS X(M[X] <- PC, PC <- X+1)

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 34 Powerful Instructions Process State Exchange(Context Switch): Instructions required in the multiprogramming environments Otherwise LD R1, addr LD R2, addr+1 … LD R5, addr+4 LM R1, R5, addrR1 M[addr] R2 M[addr+1] … R5 M[addr+4] SM R1, R5, addrM[addr] R1 M[addr+1] R2 … M[addr+4] R5 XJ(Exchange Jump of CDC 6000 series)

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 35 Basic ISA Classes: Type of Internal Storage Load/Store: 3-address ADD Ra Rb Rc Ra Rb + Rc LD Ra B Ra M[EA(B)] ST Ra B M[EA(B)] Ra Stack: 0-address ADD S[TOS] S[TOS] + S[TOS+1] General Purpose Register: 2-address ADD A B S[EA(A)] S[EA(A)] + S[EA(B)] 3-address ADD A B C S[EA(A)] S[EA(B)] + S[EA(C)] Accumulator: 1-address ADD A AC AC + M[EA(A)] (1+x)-address ADDX A AC AC + M[EA(A + [X])]

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 36 Stack Machines Instruction set: Arithmetic operators(+, -, *, /,...) push A, pop A - + a a b * b * c Example: a*b - (a+c*b) ab*(a(cb)*+)- A push a B A push b A*B * A push a A*B A C push c A*B A B*C * A*B B*C+A + (B*C+A)-A*B - A*BA CB push b

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 37 The Case Against Stacks Performance is derived from the existence of several fast registers, not from the way they are organized Data does not always “surface” when needed –Constants, repeated operands, common sub-expressions so TOP and Swap instructions are required Code density is about equal to that of GPR instruction sets –Registers have short addresses –Keep things in registers and reuse them Slightly simpler to write a poor compiler, but not an optimizing compiler

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 39 GPR Machines Faster than memory Easier for a compiler to use Used to hold variables, intermediate operands –the memory traffic reduces –the code density improves How many registers? –depends on how they are used by the compiler GPR(General Purpose Register)

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 40 How Many Registers in RF... R1... f(R1)... f(R1)... R1... Register Life We need to try to keep the live registers in the RF Avg number of simultaneous RL:2 ~ 6 No program uses more than 15 registers simultaneously 6 algorithms from CALGO(ACM) written in 4 languages; ALGOL,BASIC, BLISS,FORTRAN Register Life LifeNo. ofFractionCum. LengthLivesFraction 1~1174,9270.090.09 2~3728,3460.380.48 4~7547,0720.290.77 8~15252,5080.130.90 16~31116,4040.060.96 32~63 41,6730.020.98 64~127 17,7900.010.99 128~ 15,6030.011.00

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 41 GPR Machines Maximum number of operands(O) –two or three operands Number of memory addresses(M) –0,1,2,3 0 3 SPARC, MIPS, PowerPC, ALPHA (0,3) 1 2 Intel 80x86, Motorola 68000 (1,2) 2 2 VAX (2,2) 3 3 VAX (3,3) No of memory addresses Maximum No of operands allowed Examples Type (M,O)

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 42 GPR Machines Type Register- register (0,3) Register- memory (1,2) Memory- memory (3,3) Advantages Simple, fixed-length instr. encoding. Simple code generation model Data can be accessed without loading first. Instruction format tends to be easy to encode and yields good density. Program becomes most compact. No waste of registers for temporaries. Disadvantages Higher instruction count. Some instructions are short and bit encoding may be wasteful. A source operand is destroyed. Clocks per instruction varies by operand location. Large variation in instruction sizes and in work per instruction. Memory accesses create memory bottleneck. (0,3):ADD R1,R2,R3R[R1] R[R2] + R[R3] (1,2): ADD R1, XR[R1] R[R1] + M[X] (3,3): ADD X1,X2,X3M[X1] M[X2] + M[X3] Example

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 43 R-R vs RM A+B+C RR Instructions LDR1,A LDR2,B LDR3,C ADDR4,R1,R2 ADDR5,R4,R3 RM instructions LDR1,A ADDR1,B ADDR1,C RM instructions reduce IC

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 44 What About Actual Programs Consider a GPR machine with a large register file. - Highly probable that the intermediate data can be found in a register - Thus, LD/ST instruction will be used less frequently - However, frequency of using LD/ST instructions in the computers that use RM instructions will reduced further

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 45 Variable format, 2- and 3-address instructions VAX-11 OpCode A/M Byte 0 1 nm 32-bit word size, 16 GPR (4 reserved) Rich set of addressing modes (apply to any operand) Rich set of operations – bit field, stack, call, case, loop, string, poly, system Rich set of data types (B, W, L, Q, O, F, D, G, H) Condition codes

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 46 Kinds of Addressing Modes Register direct[Ri] Immediate (literal)v Direct (absolute)M[v] Register indirect M[[Ri]] Base+DisplacementM[[Ri] + v] Base+IndexM[[Ri] + [Rj]] Scaled IndexM[[Ri] + [Rj]*d + v], eg. d=8 AutoincrementM[[Ri]+1] AutodecrementM[[Ri] - 1] Memory IndirectM[ M[Ri] ] [Indirection Chains] M R memory reg. file OP Ri Rj v Addressing Mode value in [ ] is the operand

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 47 Memory Addressing Modes (VAX) Memory IndirectScaledRegister deferredImmediateDisplacement Tex spice gcc 0 10 20 30 40 50 60 % 1 6 1 0 16 6 24 3 11 43 17 39 32 55 40

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 48 Operand Address bits: Displacement Values This value is related to the operand address field when the address is represented by the displacement from the base address Wide distribution The vast majority --- positive A majority of the large displacements -negative

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 49 Operand Address bits: Immediate Addressing Mode LoadsComparesALU opAll instr Percentage of operations that use immediates Integer Avg. FP Avg. 0 10 20 30 40 50 60 70 80 90 % 10 45 87 77 58 78 35 10

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 50 Operand Address bits: Immediate Addressing Mode

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 51 Operations in the Instr. Set Operator type Examples Add, Subtract, … Data transfer Load, Store, Move, … Control Branch, Jump, Procedure Call, Return, Trap System Operating System Call, VMM instructions Floating Point Floating Point Add Decimal Decimal Add, Decimal-to-Character Conversion String String Move, String Compare, String Search Graphics Arithmetic and logical Pixel operations, Compress/Decompress op.

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 52 Operations in the Instr. Set Rank 1 2 3 4 5 6 7 8 9 10 Total 80x86 instructions load conditional branch compare store add and sub move reg-reg call return Integer average (% total executed) 22% 20% 16% 12% 8% 6% 5% 4% 1% 96%

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 53 Control Flow Instructions Call/returnJumpConditional branch Integer Avg. FP Avg. 0 10 20 30 40 50 60 70 80 90 % 81 87 13 11 64

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 55 RISC

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 56 Instruction Execution Characteristics: Type of Operations LanguagePASCALFORTRANPASCALCSAL WorkloadScientificStudentSystemsSystemsSystems Assignment7467453842 LOOP43534 Call13151212 If2011294336 goto29-3- others-7616 What type of statements is most frequent? – Assignment statements dominate » Functional instructions and Transfer instructions » Movements of data must be made simple, thus fast – Conditional Statements(if and loop together) » Instructions with Control function » Sequence control mechanism is important Relative Dynamic Frequencies of statements in HLL programs

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 57 Instruction Execution Characteristics: Time Consumed by Statements Time Consumed Number of Machine Instructions Dynamic OccurMachine Instr WtMemory Ref Wt PASCALCPASCALCPASCALC Assignment453813131415 Loop5342323326 Call/Return151231334445 If29431121713 goto-3---- others613121 Machine instruction weighted = [Average No. of machine Instr. / Statements] x [Frequency of Occurrences] Memory reference weighted = [Average No. of memory references / Statement] x [Frequency of Occurrences] Most time consuming statement is procedure CALL/RETURN

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 58 Instruction Execution Characteristics: Type of Operands Dynamic Frequencies of Occurrences PASCALCAverage Integer Constant162320 Scalar Variable585355 Array/Structure262425 Majority of references to scalar – 80% are local to a procedure – References to arrays/structure require index or pointer Locations of operands(Average per instruction) – 0.5 operands in memory – 1.4 operands in registers

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 59 Two most significant aspects in implementing procedure Call/Returns –Number of parameters –Depth of nesting Statistics on Number of Parameters –98% of dynamically called procedures were passed fewer than 6 parameters –92% of them used fewer than 6 local scalar variables Instruction Execution Characteristics: Procedure Calls CALL SUB(X1, X2, X3) parameters SUB(A, B, C)

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 60 Multiple Register Sets Multiple register sets: - Assume that we have several sets of registers that each set can be used by each different procedure - Saves some time in procedure CALL/RETURN simply by changing the R set pointer value... R set pointer Set 0 set 1 set 2... Set n-1

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 61 Instruction Execution Characteristics: Depth of Procedure Nesting Depth t Statistics: Window depth of 8 will need to shift only on less than 1% of calls and returns Procedure Nesting and Register Set Window When Nesting depth > 5 - Movements of >5 in either direction(CALL/RETURN) needs to shift the register set window(down/up) Nesting depth of 5 can be served with register set window of size 5 without using Memory Return Call Shifting register set window: need to save the information in one register set in the memory so that a register set can be used by the new procedure Register set window

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 62 RISC Philosophy(1): Make the Most Frequent Statements Execute Fast Most frequent statements are Assignment Type of Statements and each of them are translated by the compiler into a set of Functional Instructions and/or Transfer Instruction. Thus Functional and Transfer Instructions need to be made to execute fast. read istr. from M Decode/ effective addr read opd from M perform operation I-F(M) I-P O-F(M) E Short instruction Fixed instr. Format Simple addr. modes Have operands in registers Cannot do anything about it with an instr set Improved Architecture - Pipelined Execution Instruction Cycle of Functional Instruction or Transfer Instruction

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 63 Assignment Statements To make the Instruction Fetch fast –Short OP-code part: Small number of instructions in the instruction set –Short Operand Address part: Make the operands in the registers instead of M To make the Instruction Preparation fast –Fixed length instruction –Fixed format instruction –Simple addressing modes To make the Operand Fetch fast –Make the operands available from registers instead of memory –Needs a large register file To make the Instruction Execution fast –Multiple register set; Overlapping MRS –Instruction execution pipeline

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 64 RISC Philosophy(2): Make the Most Time-Consuming Statements Execute Fast CALL SUB(X1, X2, X3) SUB(A, B, C) Procedure Call and Return Methods of passing Parameters Through memory – Parameters are stored in the memory locations which are commonly accessible by both calling and called procedures – Execution of CALL and RETURN instructions are very slow due to the memory accesses, especially when there are many parameters to pass Through registers – Parameters are stored in the registers in CPU – Calling procedure needs to save the registers, which are not used for passing parameters, in the memory. This results in a lot of memory accesses and makes the execution times of these instructions slow.

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 65 CISC and RISC IBMVAXIntelBerkeleyIBM S/360-16811-7808086RISC I801 Year developed 7378788180 No. of instructions 20830313331120 Instruction length 16-4816-4568-323232 Addressing modes 422635 No. of GPR 1616413832 CM capacity 420Kb480Kb-00 RISC A limited and simple instruction set A large number of GPR(Register File) An emphasis on optimizing the instruction pipeline

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 66 Large Register File If the number of registers is small, it needs a strategy to keep the most frequently accessed operands in registers to minimize Register-Memory traffic - Software approach Maximize register usage by compiler (Requires sophisticated program analysis) - Hardware approach More registers in the register file Quick access to operands is desirable - Assignment Statements rely on Functional and Transfer Instructions - Functional Instructions heavily rely on registers - Frequency of Transfer Instructions depends on the number of registers in the register file

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 67 Register Window Fact –Statistically, most operand references are to local scalars - 80% –Local variables to a procedure cannot be accessed by other procedure(s) Problem –Local changes with each procedure CALL/RETURN –CALL/RETURN occurs frequently –Parameters need to be passed around Observations –Statistically, a few parameters(<6) and local variables(<6) –Statistically, depth of procedure activation fluctuates within relatively narrow range(<8) Solution –Multiple small sets of registers –Each set is assigned to a different procedures –Windows for adjacent procedures overlap to allow parameter passing

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 68 Multiple Register Set... …...... Set 1 set 2 set 3 set m Register Set Pointer Each Register Set is assigned to a different procedure - Size of a Register Set is equal to the size of a window - Parameters need to be copied in the called/calling procedure’s Register Set, however, there is no need to copy all the registers from the switched off register set - Require register move instructions

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 69 Overlapping Register Window When multiple of Register Sets are implemented in a large Register File, we call a Register Set as a Register Window. Multiple register sets still require to copy the parameter values between register sets. Overlapping Register Window - Portions of register windows overlap for passing parameters - At any time only one window is visible - No need for moving information for parameter passing CALL RETURN Exchange of parameters How about global variables? Parameter Local Temporary Registers Registers Registers Window i Procedure i Parameter Local Temporary Registers Registers Registers Window i+1 Procedure i+1 Same physical registers

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 70 Global Variables Global Variables are commonly accessible by all the procedures Assign to memory locations by compiler –Straight forward but inefficient for the frequently accessed global variables because of frequent memory accesses Set aside a set of Global Variable registers –Available to all procedures –Unified register numbering system to simplify instruction format –e.g.R0 ~ R7: Global R8 ~ R13: Current window

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 71 Linear Organization of Register Windows Global Registers Physical Register File 0 p-1 n-1 Set 3. 0 p-1 p m-1 Set 2 0 p-1 p m-1 Set 1 0 p-1 p m-1

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 75 Code Size Smaller programs –Program takes less memory space –Smaller program improves performance Fewer instructions Fewer bytes to fetch In paging environment, occupy in fewer pages and reduces page faults CISC –Smaller number of instructions in the program(program may be shorter but not necessarily smaller space)

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 76 Example CISC RISC LD Rb B LD Rc C ADD Ra Rb Rc ST Ra A Memory Traffic Instruction:56 bits Data:32 x 3 = 96 bits Total MB used:56 + 96 = 152 bits 8 4+12 4+12 4+12 Memory Traffic Instruction:112 bits Data:96 bits Total MB used:200 bits CISC: More instructions in the instructions set Longer OP-code RISC: More chances of storing intermediate results in registers Less use of LD/ST 8 4 4 12 A B + C Compare total MB

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 77 Characteristic of RISC(1, 2) (1) 1 Instruction per cycle(memory cycle) –Machine cycle: IF + IP + Time to fetch the operands from registers + Perform operation + Store the result in a register –RISC instruction CISC micro-instruction => No need to microprogram(Hardwired control) (2) Register-to-Register operation –With only simple Load and Store operations for accessing memory(Load/Store Arch.) –Simplifies the instruction set, and control unit B B+C A A+B D D- A Data:32 bits OP-code:8 bits Reg Address:4 bits M address: MM instr- 12 + 4 bits RISC -- 12 bits I: 56 x 3 = 168 bits I: 28 x 3 = 84 bits D: 96 x 3 = 288 bits D: 0 bits Total MB: 456 bits Total MB: 84 bits Cycles: 3 x 4 = 12 Cycles: 3 x 1 = 3 CISC-IRISC ADD B,C, BADD Rb, Rc, Rb ADD A, B, AADD Ra, Rb, Ra ADD D, A, DADD Rd, Ra, Rd

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 78 (3) Simple Addressing Modes - Shorten EA generation time –Almost all instructions use register addressing –Relative addressing using PC, BAR, and Index address –Other complex modes may be synthesized by software Characteristic of RISC(3, 4) Addressing ModesEffective AddressSynthesisUsed by ImmediateOperand=AS2R-to-R DirectEA = A0 + S2LD/ST RegisterEA = RRs1, S2R-to-R Register IndirectEA = [R]Rs1 + 0LD/ST DisplacementEA = [R] + ARs1 + S2LD/ST OP-code... Rs1 S2 (4) Simple Instruction Format - Shorten instruction Decoding Time –Usually one format –Fixed length/align on word boundary –Fixed field length

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 79 (5) Pipelining (We will learn this later in detail) At this time, you just need to know that - Instruction execution hardware can be made of a few inter-connected independent sub-modules, called pipeline STAGEs Characteristic of RISC(5) S0 S1 S2 S3 - An instruction execution progresses at each pipeline stage in sequence - When an instruction completes its execution at the i-th stage, the next instruction commences its execution at the i-th stage - Thus, in the ideal situation, throughput increases nearly n times, where n is the number of pipeline stages - Branch instruction makes the pipelined execution inefficient

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 80 Pipelined Execution S0 S1 S2 S3 I0I0 S0 I0I0 S1 I0I0 I2I2 S2 I0I0 I2I2 S0 I3I3 I1I1 I1I1 I1I1 I1I1 S1 I1I1 S3 I1I1 S2 I1I1 I2I2 S1 I3I3 S0 I4I4 Execution of a Sequence of Instructions S3 I2I2 S2 I3I3 S1 I4I4 S0 1 instruction execution S0 S1 S2 S3 I0I0 I0I0 S0 I0I0 S1 I0I0 S2 At 4t: I 0 At 5t: I 1 At 6t: I 2 At 7t: I 3 At 8t: I 4 N instructions complete at (n+3)t When n is large it becomes nt Thus, 1 instruction in every t I 0 completes at (t x 4) S3

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 81 A "Typical" RISC 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP takes a pair) 3-address, R-R functional instruction Single address mode for load/store: base + displacement »no indirection Simple branch conditions Delayed branch see: SPARC, MIPS, MC88100, AMD2900, i960, i860 PARisc, DEC Alpha, Clipper, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 82 Branch Displacement

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 83 Implementation of Conditional Branch Instructions Evaluating branch conditions Name Condition Code(CC) Condition Register Compare and Branch How condition is tested Test special bits set by ALU operations, possibly under program control. Test arbitrary registers set by the result of a comparison. Compare is part of the branch. Often compare is limited to subset. Advantages Sometimes condition is set for free, if not 2 instr’s for a branch. Simple. 2 instr’s for a branch 1 instr. rather than 2 for a branch Disadvantages CC is an extra state. CCs constrain the ordering of instrs since they pass info from one instr to a branch. Uses up a register. May be too much work per instruction.

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 84 Putting It All Together: DLX Architecture Read Section 2.8 ---- MUST DLX emphasizes A simple load-store instruction set Design for pipelining efficiency A fixed instr. set encoding Efficiency as a compiler target

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 85 Example: MIPS Op 31 26 01516202125 Rs1Rs2RdOpx Register-Register 56 1011 Op 312601516202125 Rs1Rd immediate Register-Immediate Op 312601516202125 Rs1Rs2/Opx immediate Branch Op 3126025 target Jump / Call

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 86 The Different Goals for VAX and MIPS VAX - simple compilers and code density –powerful addressing modes –powerful instructions –efficient instruction encoding –few registers MIPS - high performance via pipelining, ease of HW implementation, compatibility with highly optimizing compiler –simple instruction –simple addressing modes –fixed-length instruction formats –a large number of registers

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 87 VAX vs. MIPS

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 88 Fallacies and Pitfalls Pitfall: Designing a “high-level” instruction set feature specifically oriented to supporting a high-level language structure. Fallacy: There is such a thing as a typical program. Fallacy: An architecture with flaws cannot be successful. –80x86: supports Segmentation while other support page Extended AC for integer, while others use GPR Stack for FP operations, while others abandoned stack Fallacy: You can design a flawless architecture. –All architecture design involves trade-off made in the context of a set of HW and SW technologies.

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 89 Most Popular ISA of All Time: Intel 80x86 1971: Intel invents microprocessor 4004/8008, 8080 in 1975 1975: Gordon Moore realized one more chance for new ISA before ISA locked in for decades »Hired CS people in Oregon »Weren’t ready in 1977 (CS people did 432 in 1980) »Started crash effort for 16-bit microcomputer 1978: 8086 dedicated registers, segmented address, 16- bit 8088; 8-bit external bus version of 8086; added as after thought

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 90 Most Popular ISA of All Time: Intel 80x86 1980: IBM selects 8088 as basis for IBM PC 1980: 8087 floating point coprocessor: adds 60 instructions using hybrid stack/register scheme 1982: 80286 24-bit address, protection, memory mapping 1985: 80386 32-bit address, 32-bit GP registers, paging 1989: 80486 & Pentium in 1992: faster + MP few instructions

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 91 80x86 Addressing/Protection

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 92 80x86 Instruction Format 8086 in blue; 80386 extensions in red } } Repeat Lock Seg. Override Addr. Override Size Override OP- code OP-code Ext. Mod, reg,n/m SC, index,base Disp8 Disp16 Disp24 Disp32 Imm8 Imm16 Imm24 Imm32 } } } Prefixes OP-code Address Displacement Immediate (Base reg + 2 Scale x Index reg)

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 93 80x86 Instruction Encoding: Address Specifier Field: Mod, Reg, R/M r w=0 w=1 r/m mod=0 mod=1 mod=2 mod=3 16b32b16b 32b16b32b16b32b 0ALAXEAX0addr=BX+SI=EAXsamesamesamesame same 1CLCXECX1addr=BX+DI=ECXaddr addr addr addr as 2DLDXEDX2addr=BP+SI=EDXmod=0mod=0mod=0mod=0 reg 3BLBXEBX3addr=BP+SI=EBX+d8+d8+d16+d32 field 4AHSPESP4addr=SI=(sib)SI+d8(sib)+d8SI+d8 (sib)+d32 “ 5CHBPEBP5addr=DI=d32DI+d8EBP+d8DI+d16 EBP+d32 “ 6DHSIESI6addr=d16=ESIBP+d8ESI+d8BP+d16 ESI+d32 “ 7BHDIEDI7addr=BX=EDIBX+d8EDI+d8BX+d16 EDI+d32 “ Address Specifier: Reg=3 bits, R/M=3 bits, Mod=2 bits w from OPcode r/m field depends on mod and machine mode AX: AC BX: Base Address CX: Counter Dx: Data SP: Stack Pointer BP: Base Pointer SI: Source Index DI: Destin. Index Data Registers Pointers/Index Reg

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 94 80x86 Instruction Encoding: Sc/Index/Base field 0EAXEAX 1ECXECX 2EDXEDX 3EBXEBX 4no indexESP 5EBPif mod=0, d32 if mod<>0, EBP 6ESIESI 7EDIEDI Base + Scaled Index Mode Used when: mod = 0,1,2 in 32-bit mode and r/m = 4 2-bit Scale Field 3-bit Index Field 3-bit Base Field r Index Base

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 95 80x86 Addressing Mode Usage for 32-bit Mode Register indirect10%10%6%2%7% Base + 8-bit disp46%43%32%4%31% Base + 32-bit disp2%0%24%10%9% Indexed1%0%1%0%1% Based indexed + 8b disp0%0%4%0%1% Based indexed + 32b disp0%0%0%0%0% Base + Scaled Indexed12%31%9%0%13% Base + Scaled Index + 8b disp2%1%2%0%1% Base + Scaled Index + 32b disp6%2%2%33%11% 32-bit Direct19%12%20%51%26% Addressing ModeGccEspr.NASA7SpiceAvg.

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 96 80x86 Length Distribution

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 97 Instruction Counts: 80x86 vs. DLX gcc 3,771,327,742 3,892,063,4601.03 espresso 2,216,423,413 2,801,294,2861.26 spice 15,257,026,309 16,965,928,7881.11 nasa7 15,603,040,963 6,118,740,3210.39 SPEC pgm x86 DLX DLX/86

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 98 Intel Compiler vs. Compilers YOU Can Buy 66 MHz Pentium ComparisonSpecInt92SpecFP92 Intel Internal Optimizing Compiler64.659.7 Best 486 Compiler (June 1993)57.639.9 Typical 486 Compiler in 1990, when Intel started project 41.032.5 Integer: Intel 1.1X faster, FP: 1.5X faster ………………………………………………………………………. 486 ComparisonSpecInt92SpecFP92 Intel Internal Optimizing Compiler35.517.5 Best 486 Compiler (June 1993)32.216.0 Typical 486 Compiler in 1990, when Intel started project 23.012.8 Integer: Intel 1.1X faster, FP: 1.1X faster

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 99 Intel Summary Archeology: history of instruction design in a single product –Address size: 16 bit vs. 32-bit –Protection: Segmentation vs. paged –Temp. storage: accumulator vs. stack vs. registers “Golden Handcuffs”of binary compatibility affect design 20 years later, as Moore predicted Not too difficult to make faster, as Intel has shown HP/Intel announcement of common future instruction set by 2000 means end of 80x86??? “Beauty is in the eye of the beholder” –At 50M/year sold, it is a beautiful business

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 1 Lecture 4 Instruction Set Architecture.

Similar presentations

Presentation on theme: "Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 1 Lecture 4 Instruction Set Architecture."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 1 Lecture 4 Instruction Set Architecture.

Similar presentations

Presentation on theme: "Instruction Set ArchitectureCS510 Computer ArchitecturesLecture 4 - 1 Lecture 4 Instruction Set Architecture."— Presentation transcript:

Similar presentations

About project

Feedback