Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS311-Computer OrganizationRISCLecture 8 - 1 Lecture 8 Reduced Instruction Set Computer.

Similar presentations


Presentation on theme: "CS311-Computer OrganizationRISCLecture 8 - 1 Lecture 8 Reduced Instruction Set Computer."— Presentation transcript:

1 CS311-Computer OrganizationRISCLecture 8 - 1 Lecture 8 Reduced Instruction Set Computer

2 CS311-Computer OrganizationRISCLecture 8 - 2 Lecture 8: RISC In this lecture, we will study Program execution characteristics RISC Philosophy –Make the most frequently executed statement fast »Functional, Transfer instructions »Simple, small number of fixed format instructions »Large register file –Make the most time consuming statements fast »Procedure Call and Return instructions »Large register file Large Register File Overlapping Register Windows –Linear and Circular organization of ORWs Ultimate RISC

3 CS311-Computer OrganizationRISCLecture 8 - 3 Instruction Execution Characteristics: Type of Operations Relative Dynamic Frequencies of statements in HLL programs LanguagePASCALFORTRANPASCALCSAL WorkloadScientificStudentSystemsSystemsSystems Assignment7467453842 LOOP43534 Call13151212 If2011294336 goto29-3- others-7616 What type of statements is most frequent? – Assignment statements dominate » Functional instructions and Transfer instructions » Movements of data must be made simple, thus fast – Conditional Statements(if and loop together) » Instructions with Control function » Sequence control mechanism is important

4 CS311-Computer OrganizationRISCLecture 8 - 4 Instruction Execution Characteristics: Time Consumed by Statements Time Consumed Number of Machine Instructions Dynamic OccurMachine Instr WtMemory Ref Wt PASCALCPASCALCPASCALC Assignment453813131415 Loop5342323326 Call151231334445 If29431121713 goto-3---- others613121 Machine instruction weighted = [Average No. of machine Instr. / Statements] x [Frequency of Occurrences] Memory reference weighted = [Average No. of memory references / Statement] x [Frequency of Occurrences] Most time consuming statement is procedure CALL/RETURN

5 CS311-Computer OrganizationRISCLecture 8 - 5 Instruction Execution Characteristics: Type of Operands Dynamic Frequencies of Occurrences PASCALCAverage Integer Constant162320 Scalar Variable585355 Array/Structure262425 Majority of references to scalar – 80% are local to a procedure – References to arrays/structure require index or pointer Locations of operands(Average per instruction) – 0.5 operands in memory – 1.4 operands in registers

6 CS311-Computer OrganizationRISCLecture 8 - 6 Two most significant aspects in implementing this operation –Number of parameters –Depth of nesting Statistics on Number of Parameters –98% of dynamically called procedures were passed fewer than 6 parameters –92% of them used fewer than 6 local scalar variables Instruction Execution Characteristics: Procedure Calls CALL SUB(X1, X2, X3) parameters SUB(A, B, C)

7 CS311-Computer OrganizationRISCLecture 8 - 7 Multiple Register Sets Multiple register sets: - Assume that we have several sets of registers that each set can be used by each different procedure - Saves some time in procedure CALL/RETURN simply by changing the R set pointer value... R set pointer Set 0 set 1 set 2... Set n-1

8 CS311-Computer OrganizationRISCLecture 8 - 8 Instruction Execution Characteristics: Depth of Procedure Nesting Depth t Statistics: Window depth of 8 will need to shift only on less than 1% of calls and returns Procedure Nesting and Register Set Window When Nesting depth > 5 - Movements of >5 in either direction(CALL/RETURN) needs to shift the register set window(down/up) Nesting depth of 5 can be served with register set window of size 5 without using Memory Return Call Shifting register set window: need to save the information in one register set in the memory so that a register set can be used by the new procedure Register set window

9 CS311-Computer OrganizationRISCLecture 8 - 9 Complex Instruction Set Computer(CISC) Design Philosophy of CISC Distinction between Architecture and Implementation via microprogrammed control unit Richer Instruction Set –Performance of instruction - powerfulness –Reduce Semantic Gap for programming easiness –Simplifying compiler functions Larger Microprogram –Moving hardware functions to micro-code –Moving software functions to micro-code Parallelism –Pipelining –Multiple function units, processors, computers NO ATTENTION ON INSTRUCTION FREQUENCY, TIME-CONSUMING INSTRUCTIONS, etc

10 CS311-Computer OrganizationRISCLecture 8 - 10 RISC Philosophy(1): Make the Most Frequent Statements Execute Fast Most frequent statements are Assignment Type of Statements and each of them are translated by the compiler into a set of Functional Instructions and/or Transfer Instruction. Thus Functional and Transfer Instructions need to be made to execute fast. read istr. from M Decode/ effective addr read opd from M perform operation I-F(M) I-P O-F(M) E Short instruction Fixed instr. Format Simple addr. modes Have operands in registers Cannot do anything about it with an instr set Improved Architecture - Pipelined Execution Instruction Cycle of Functional Instruction or Transfer Instruction

11 CS311-Computer OrganizationRISCLecture 8 - 11 Assignment Statements To make the Instruction Fetch fast –Short OP-code part: Small number of instructions in the instruction set –Short Operand Address part: Make the operands in the registers instead of M To make the Instruction Preparation fast –Fixed length instruction –Fixed format instruction –Simple addressing modes To make the Operand Fetch fast –Make the operands available from registers instead of memory –Needs a large register file To make the Instruction Execution fast –Multiple register set; Overlapping MRS –Instruction execution pipeline

12 CS311-Computer OrganizationRISCLecture 8 - 12 RISC Philosophy(2): Make the Most Time-Consuming Statements Execute Fast CALL SUB(X1, X2, X3) SUB(A, B, C) Procedure Call and Return Methods of passing Parameters Through memory – Parameters are stored in the memory locations which are commonly accessible by both calling and called procedures – Execution of CALL and RETURN instructions are very slow due to the memory accesses, especially when there are many parameters to pass Through registers – Parameters are stored in the registers in CPU – Calling procedure needs to save the registers, which are not used for passing parameters, in the memory. This results in a lot of memory accesses and makes the execution times of these instructions slow.

13 CS311-Computer OrganizationRISCLecture 8 - 13 Time Out 어떤 노파가 고양이와 함께 앉아서 먼지 낀 램프를 닦고 있었다. 끄 때 조그만 요정 하나가 램프에서 튀어나오더니 노파에게 세 가지 소 원을 말하라고 했다. 노파는 얼른 “ 부자가 되고 싶고, 젊고 아름다워지고 싶으며, 고양이가 잘 생긴 왕자가 되었으면 좋겠어요.” 라고 말했다. 그러자 연기가 피어 오르며 펑 하는 소리가 나더니 노파는 젊고 아름다 워졌으며, 주위에는 금은보화가 산더미 같이 쌓여있었다. 고양이는 자취 를 감추고 대신 늠름한 왕자가 나타나서 두 팔을 벌리고 있었다. 젊어진 노파는 얼른 그의 품에 안겼다. 왕자는 여자의 귀에 대고 부드럽게 속삭였다. “ 당신이 전에 내가 고양이였을 때 나한테 거세수술을 해준 걸 후회하지 않나요 ?”

14 CS311-Computer OrganizationRISCLecture 8 - 14 CISC and RISC IBMVAXIntelBerkeleyIBM S/360-16811-7808086RISC I801 Year developed 7378788180 No. of instructions 20830313331120 Instruction length 16-4816-4568-323232 Addressing modes 422635 No. of GPR 1616413832 CM capacity 420Kb480Kb-00 Cache 64Kb64Kb-00 RISC A limited and simple instruction set A large number of GPR(Register File) An emphasis on optimizing the instruction pipeline

15 CS311-Computer OrganizationRISCLecture 8 - 15 Large Register File If the number of registers is small, it needs a strategy to keep the most frequently accessed operands in registers to minimize Register-Memory traffic - Software approach Maximize register usage by compiler (Requires sophisticated program analysis) - Hardware approach More registers in the register file Quick access to operands is desirable - Assignment Statements rely on Functional and Transfer Instructions - Functional Instructions heavily rely on registers - Frequency of Transfer Instructions depends on the number of registers in the register file

16 CS311-Computer OrganizationRISCLecture 8 - 16 Register Window Fact –Statistically, most operand references are to local scalars - 80% –Local variables to a procedure cannot be accessed by other procedure(s) Problem –Local changes with each procedure CALL/RETURN –CALL/RETURN occurs frequently –Parameters need to be passed around Observations –Statistically, a few parameters(<6) and local variables(<6) –Statistically, depth of procedure activation fluctuates within relatively narrow range(<8) Solution –Multiple small sets of registers –Each set is assigned to a different procedures –Windows for adjacent procedures overlap to allow parameter passing

17 CS311-Computer OrganizationRISCLecture 8 - 17 Multiple Register Set... …...... Set 1 set 2 set 3 set m Register Set Pointer Each Register Set is assigned to a different procedure - Size of a Register Set is equal to the size of a window - Parameters need to be copied in the called/calling procedure’s Register Set - Require register move instructions

18 CS311-Computer OrganizationRISCLecture 8 - 18 Overlapping Register Window When the Register Sets are implemented in a large Register File, we call the Register Set as a Register Window. Overlapping Register Window - Portions of register windows overlap for passing parameters - At any time only one window is visible - No need for moving information for parameter passing CALL RETURN Exchange of parameters How about global variables? Parameter Local Temporary Registers Registers Registers Window i Procedure i Parameter Local Temporary Registers Registers Registers Window i+1 Procedure i+1 Same physical registers

19 CS311-Computer OrganizationRISCLecture 8 - 19 Global Variables Global Variables are commonly accessible by all the procedures Assign to memory locations by compiler –Straight forward but inefficient for the frequently accessed global variables because of frequent memory accesses Set aside a set of Global Variable registers –Available to all procedures –Unified register numbering system to simplify instruction format –e.g.R0 ~ R7: Global R8 ~ R13: Current window

20 CS311-Computer OrganizationRISCLecture 8 - 20 Linear Organization of Register Windows Global Registers Physical Register File 0 p-1 n-1 Set 3. 0 p-1 p m-1 Set 2 0 p-1 p m-1 Set 1 0 p-1 p m-1

21 CS311-Computer OrganizationRISCLecture 8 - 21 Circular Organization of Register Windows Procedure Call: CWP CWP+1(current window pointer) if CWP=SWP(save window pointer) then interrupt, save Window(SWP), SWP SWP+1 Load temporary register with parameters which must be passed down Call proceeds W0 W5 W4 W1 W2 W3 SWP CWP Restore Save Return Call n-window register file accommodates n-1 procedure calls Return: CWP CWP-1 if CWP=SWP then interrupt, restore called procedure’s Window(SWP), SWP SWP-1

22 CS311-Computer OrganizationRISCLecture 8 - 22 Code Size Smaller programs –Program takes less memory space –Smaller program improves performance »Fewer instructions »Fewer bytes to fetch »In paging environment, occupy in fewer pages and reduces page faults CISC –Smaller number of instructions in the program(program may be shorter but not necessarily smaller space)

23 CS311-Computer OrganizationRISCLecture 8 - 23 ExampleExample CISC RISC LD Rb B LD Rc C ADD Ra Rb Rc ST Ra A Memory Traffic Instruction:56 bits Data:32 x 3 = 96 bits Total MB used:56 + 96 = 152 bits 8 4+12 4+12 4+12 8 4 4 12 Memory Traffic Instruction:112 bits Data:96 bits Total MB used:200 bits CISC: More instructions in the instructions set Longer OP-code RISC: More chances of storing intermediate results in registers Less use of LD/ST

24 CS311-Computer OrganizationRISCLecture 8 - 24 Characteristic of RISC(1, 2) (1) 1 Instruction per cycle(memory cycle) –Machine cycle: IF + IP + Time to fetch the operands from registers + Perform operation + Store the result in a register –RISC instruction CISC micro-instruction => No need to microprogram(Hardwired control) (2) Register-to-Register operation –With only simple Load and Store operations for accessing memory(Load/Store Arch.) –Simplifies the instruction set, and control unit B B+C A A+B D D- A Data:32 bits OP-code:8 bits Reg Address:4 bits M address: MM instr- 12 + 4 bits RISC -- 12 bits I: 56 x 3 = 168 bits I: 28 x 3 = 84 bits D: 96 x 3 = 288 bits D: 0 bits Total MB: 456 bits Total MB: 84 bits Cycles: 3 x 4 = 12 Cycles: 3 x 1 = 3 CISC-IRISC ADD B,C, BADD Rb, Rc, Rb ADD A, B, AADD Ra, Rb, Ra ADD D, A, DADD Rd, Ra, Rd

25 CS311-Computer OrganizationRISCLecture 8 - 25 (3) Simple Addressing Modes - Shorten EA generation time –Almost all instructions use register addressing –Relative addressing using PC, BAR, and Index address –Other complex modes may be synthesized by software Characteristic of RISC(3, 4) Addressing ModesEffective AddressSynthesisUsed by ImmediateOperand=AS2R-to-R DirectEA = A0 + S2LD/ST RegisterEA = RRs1, S2R-to-R Register IndirectEA = [R]Rs1 + 0LD/ST DisplacementEA = [R] + ARs1 + S2LD/ST OP-code... Rs1 S2 (4) Simple Instruction Format - Shorten instruction Decoding Time –Usually one format –Fixed length/align on word boundary –Fixed field length

26 CS311-Computer OrganizationRISCLecture 8 - 26 (5) Pipelining (We will learn this later in detail) At this time, you just need to know that - Instruction execution hardware can be made of a few inter-connected independent sub-modules, called pipeline STAGEs Characteristic of RISC(5) S0 S1 S2 S3 - An instruction execution progresses at each pipeline stage in sequence - When an instruction completes its execution at the i-th stage, the next instruction commences its execution at the i-th stage - Thus, in the ideal situation, throughput increases nearly n times, where n is the number of pipeline stages - Branch instruction makes the pipelined execution inefficient

27 CS311-Computer OrganizationRISCLecture 8 - 27 Laundry Task Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold A BCD We have 3 different work stages Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes

28 CS311-Computer OrganizationRISCLecture 8 - 28 Sequential Laundry A B C D TaskOrderTaskOrder 304020304020304020304020 6 PM 789 10 11 Midnight Time Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take?

29 CS311-Computer OrganizationRISCLecture 8 - 29 Pipelined Laundry A B C D T a s k O r d e r 6 PM 7891011Midnight Time 3040 20 Pipelined laundry takes 3.5 hours for 4 loads Maximum of 3 tasks can be carried out concurrently

30 CS311-Computer OrganizationRISCLecture 8 - 30 Pipelined Execution S0 S1 S2 S3 I0I0 S0 I0I0 S1 I0I0 I2I2 S2 I0I0 I2I2 S0 I3I3 I1I1 I1I1 I1I1 I1I1 S1 I1I1 S3 I1I1 S2 I1I1 I2I2 S1 I3I3 S0 I4I4 Execution of a Sequence of Instructions S3 I2I2 S2 I3I3 S1 I4I4 S0 1 instruction execution S0 S1 S2 S3 I0I0 I0I0 S0 I0I0 S1 I0I0 S2 t x 4 At 4t: I 0 At 5t: I 1 At 6t: I 2 At 7t: I 3 At 8t: I 4 N instructions complete at (n+3)t When n is large it becomes nt Thus, 1 instruction in every t

31 CS311-Computer OrganizationRISCLecture 8 - 31 Pipeline Characteristics Multiple tasks operating simultaneously Pipeline does not help latency of single task, but it helps throughput of entire workload Pipeline rate is limited by the slowest pipeline stage Unbalanced lengths of pipeline stages reduce speedup Potential speedup = Number of pipeline stages Time to Fill pipeline and time to drain it reduces speedup

32 CS311-Computer OrganizationRISCLecture 8 - 32 Time Out 수게 한 마리가 암게를 만나 청혼을 했다. 그런데 암게가 보니 그 수게가 옆으로 걷지 않고 똑바로 걷는 것이었다. ‘ 이놈 정말 별난 놈이로구나. 이런 놈을 놓쳐서는 안되겠다.’ 이렇게 생각하 고 즉시 그 수게와 결혼했다. 그런데 다음날 암게는 남편이 다른 게들이나 마찬가지로 옆으로 걷는 걸 보 고 화가 나서 따졌다. “ 도대체 어떻게 된 거에요 ?” 우리가 결혼하기 전에는 당신은 똑바로 걷지 않았어요 ?” 수게가 대답했다. “ 아이쿠, 여보, 매일 그렇게 술을 많이 마실 순 없지 않소.”

33 CS311-Computer OrganizationRISCLecture 8 - 33 Berkeley RISC Instruction Format RISC-I and RISC-II A 32-bit processor 31 and 39 instructions, respectively ORW, 138 Rs; Window: 10 global, 6 temporary, 10 local, 6 parameter imm19 Cond(flag): C, Z, O, N Rd: destination register Rs1: Source register S2: Functional Instr.: if MSB=0, then S2=Rs2: another source register : if MSB=1, imm13(13-bit immediate data) Transfer or Sequencing Instr.: if MSB=0, EA=[Rs1]+[Rs2]; index reg. : if MSB=1, EA=[Rs1]+imm13 RISC-II: EA=[PC] + S2 OP=code SCC Rd Rs1 S2 Cond 1 imm13 0 Rs2 5 4 31 24 23 19 18 13 0 7 1 5 5 14

34 CS311-Computer OrganizationRISCLecture 8 - 34 RISC-II Instruction Set Functional(C:carry, R:reverse) –ADD, ADDC, SUB, SUBC, SUBR, SUBCR, AND, OR, XOR, SLL, SRL, SRA Transfer(X:index, W:word, H:half, B:byte, R:relative, U/S:unsigned) (Index: EA=Rs1+S2(Rs2), Relative: EA=PC+S2(Rs2)) –LDXW, LDXHU(S), LDXBU(S), LDRW, LDRHU(S), LDRBU(S) –STXW, STXHU(S), STXBU(S), STRW, STRHU(S), STRBU(S) Sequence Control –JMP, JMPR, CALL, CALLR, RET, CALLINT, RETINT,...

35 CS311-Computer OrganizationRISCLecture 8 - 35 Ultimate RISC Instruction Set BN instruction –Conditional branch phase in each instruction cycle –Does not conform with RISC philosophy, that is, inefficient use of instruction pipeline Ultimate RISC instruction set –Move the content of the SOURCE(Read) to the DESTINATION(Write), both within memory –2-address instruction »1 address fits in an M word »4-cycle instruction addr M[PC], PC PC + 1 temp M[addr] addr M[PC], PC PC + 1 M[addr] temp tempPC addr XYXY A X Y X A Y A

36 CS311-Computer OrganizationRISCLecture 8 - 36 Ultimate RISC Architecture IEU ALUMI/O BUS Memory Mapped I/O Memory Mapped ALU PC: 1 special word(address=0) ALU contains an accumulator and flags Memory Mapped ALU Arithmetic operations - Special Addresses When ALU is used as a Destination - Store a value in AC - Operate on AC When ALU is used as the Source - One address gets the value of AC - Other addresses test the conditions code and sets the destination address (Branch either one of the 2 consecutive addresses)

37 CS311-Computer OrganizationRISCLecture 8 - 37 Memory Mapped ALU Writing an operand into an address associated with the operation, reading the resulting from the result from the other address Address Write(used as the destination) Read(source address) 8AC datadata AC 9AC AC - datadata N 10AC data - ACdata Z 11AC data + ACdata V 12AC data + ACdata C 13AC data v ACdata N + 0 14AC AC ^ datadata ((N + 0) v Z) 15AC data / 2data C^ ~Z

38 CS311-Computer OrganizationRISCLecture 8 - 38 Condition Codes and Branching Condition Codes 2(10): True 0(00): False - Upon testing a CC, it sets the LSB of the destination address - This allows to branch either one of the two consecutive instructions Branch Moving a target address to location 0(PC) Set to 00 when False Set to 10 when True Destination address

39 CS311-Computer OrganizationRISCLecture 8 - 39 Instructions Cycle Instruction Layout in memory - 2 adjoining words/instruction - Contiguous storage of instructions Completion 0f 1 instr/cycle Instruction Cycle - 4 clean cycles for pipelining [1] Fetch Source Address and increment PC:ISread [2] Read Source Data:RSread [3] Fetch Destination Address:IDread [4] Write Data to Destination:WDwrite Pipelining with a 4-port memory(3 reads and 1 write) Instruction 1:IS1RS1ID1WD1 Instruction 2:IS2RS2ID2WD2 Instruction 3:IS3RS3ID3WD3 Instruction 4:IS3RS4ID4WD4 S D S D S D... S D

40 CS311-Computer OrganizationRISCLecture 8 - 40 Improvement 3-Cycle Design S S S... S D D D... D Instruction Cycle - 3 clean cycles [1] Fetch Source and Destination Addresses and increment PC:ISDread [2] Read Source Data:RSread [3] Write Data to Destination:WDwrite 3-way Pipelining using a 3-port memory(2 read ports and 1 write port) Instruction 1:ISD1RS1WD1 Instruction 2:ISD2RS2WD2 Instruction 3:ISD3RS3WD3 Instruction 4:ISD4RS4WD4 Completion of 1 instr./cycle

41 CS311-Computer OrganizationRISCLecture 8 - 41 Improvement 2-Cycle Design IEU I/O Data Memory ALU Instruction Memory Instruction Cycle (2 dedicated memory units; 1 instruction, 1 data) [1] Read Data from Source:RSread [2] Write Data to Destination,WDwrite Read instruction,(RIread) Increment PC: 2-way Pipelining Instruction p+1:WDpRSp+1WDp+1 Instruction 2:RSp+2WDp+2 Instruction 3:RSp+3WDp+3 Instruction 4:RSp+4WDp+4 Completion of 1 instr./cycle


Download ppt "CS311-Computer OrganizationRISCLecture 8 - 1 Lecture 8 Reduced Instruction Set Computer."

Similar presentations


Ads by Google