Presentation is loading. Please wait.

Presentation is loading. Please wait.

SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan

Similar presentations


Presentation on theme: "SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan"— Presentation transcript:

1 SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc

2 2 Basic Computer Organization Main parts of a computer system: Processor: Executes programs Main memory: Holds program and data I/O devices: For communication with outside Machine instruction: Description of primitive operation that machine hardware is able to execute Instruction Set: Complete specification of all the kinds of instructions that the processor hardware was built to execute e.g. ADD these two integers

3 3 Basic Computer Organization Memory I/O Bus I/O ALURegisters CPU Control

4 4 Inside the Processor… Hardware to manage instruction execution Arithmetic, logic hardware Registers: small units of memory to hold data/instructions temporarily during execution Two kinds of registers 1. Special purpose registers 2. General purpose registers

5 5 Special Purpose Registers Program Counter (PC): specifies location in memory of instruction being executed Instruction Register (IR): holds that instruction Processor Status Register: holds status information about current state of processor, such as whether an arithmetic overflow has occurred, etc

6 6 General Purpose Registers Available for use by programmer, possibly for keeping frequently used data Why? Since there is a large speed disparity between processor and main memory 1 GHz Processor: 1 nanosecond time scale Memory: ~ 50 - 100 nsec time scale What do these numbers mean? Instruction operands can come from registers or from main memory

7 7 Basic Computer Organization Cache Memory I/O Bus I/O MMU ALURegisters CPU Control General Purpose Registers Integer Registers FP Registers Special Purpose Registers Program Counter Instruction Register

8 8 Main Memory Holds instructions and data View as sequence of locations, each referred to by a unique memory address If size of each memory location is 1 Byte, we call the memory byte addressable This is quite typical, as smallest data (character) is represented in 1 Byte Larger data items are stored in contiguous memory locations, e.g., a 4Byte integer would occupy 4 consecutive memory locations

9 9 Terms: Byte ordering What is the integer (4 byte data) at Address 400? Big Endian byte ordering:1AC8B246 Little Endian byte ordering: 46B2C81A Some machines use big endian byte ordering and others use little endian byte ordering 1AC846B2F08CDF1E Data 400406404402 Address 0001 1010 1100 1000 1011 0010 0100 0110 In Hexadecimal (0,1,2,…,A,B,C,D,E,F) 0100 0110 1011 0010 1100 1000 0001 1010 Decimal: 449,360,454 Decimal: 1,186,121,754

10 10 Terms: Word Size, Word Alignment Word Size Normal size of an integer or pointer 32b (4B) on many machines Word Alignment `Integer variable X is not word aligned The data item is not located at a word boundary Word boundaries: addresses 0, 4, 8, 12, … HW: Write a C program to Identify whether a machine supports Little Endian or BigEndian Write a C program to transfer a sequence of 4-byte values from a Little Endian to BigEndian.

11 11 Instruction Set Architecture (ISA) View of the computer visible to the programmer (or compiler) Two kinds of ISAs 1. Complex Instruction Set Computer (CISC) A single instruction can perform a complex operation involving several actions 2. Reduced Instruction Set Computer (RISC) Each instruction performs a only simple operation

12 12 Instruction Set Architecture Description of machine from view of the programmer/compiler Example: Intel x86 ISA Includes specification of 1. The different kinds of instructions available (instruction set) 2. How operands are specified (addressing modes) 3. What each instruction looks like (instruction format)

13 13 Kinds of Instructions 1.Arithmetic/logical instructions Add, subtract, multiply, divide, compare (int/fp) Or, and, not, xor Shift (left/right, arithmetic/logical), rotate 2.Data transfer instructions Load (to register from memory) Store (to memory location from register) Move 3.Control transfer instructions Jump, conditional branch, function call, return 4.Other instructions Example: halt

14 14 Operand Addressing Modes Operands to an instruction Source: input value to instruction Destination: where result is to go Addressing Mode How the location of operand is specified An operand can be either in a memory location in a register

15 15 Addressing Modes: Operand in Register 1. Register Direct Addressing Mode Operand is in the specified general purpose register Example Suppose that the General Purpose Registers are numbered as 0, 1, 2, etc ADD R1, R2, R3/ R1 R2 + R3 2. Immediate Addressing Mode Operand is included in the instruction ADD R1, R2, 1/ R1 R2 + 1 R1 R2 R3 17 24 35 59 source operands destination operand

16 16 Addressing Modes: Operand in Memory 3. Register Indirect Addressing Mode Memory address of operand is in the specified general purpose register ADD R1, R1, (R2) 4. Base-Displacement Addressing Mode Memory address of operand is calculated as the sum of value in specified register and specified displacement ADD R1, R1, 4(R2) R1 R2 32 100 32 100 Address96100104108 Value01035-17 MAIN MEMORY Address96100104108 Value01035-17 MAIN MEMORY 42 67

17 17 Addressing Modes: Operand in Memory 5. Absolute Addressing Mode Memory address of operand is specified directly in the instruction ADD R1, R2, #100 6. Indexed Addressing Mode Memory address of operand is calculated as sum of contents of 2 registers ADD R1, R2, (R3+R4) Others Auto-increment/decrement (pre/post) PC relative

18 18 Case Study: MIPS I Integer Instruction Set Registers 32 32b general purpose registers, R0..R31 R0 hardwired to value 0 R31 implicitly used by instructions JAL, JALR HI, LO: 2 other 32b registers Used implicitly by multiply and divide instructions Addressing Modes Immediate, Register direct (arithmetic) Absolute (jumps) Base-displacement (loads, stores) PC relative (branches)

19 19 MIPS I ISA: General Comments All instructions, registers are 32b in size Load-store architecture: the only instructions that have memory operands are loads&stores Terminology Word: 32b Halfword: 16b Byte: 8b Displacements and immediates are signed 16 bit quantities

20 20 A RISC Instruction Set

21 21 RISC Instruction Set (contd) HW: Write a simple C program and generate the corpg. assembly language program for MIPS architecture. Understand the instructions, function call mechanism, formats of branch and jump instructions, etc.

22 22 An Example Program #include double a[100]; main() { int i; double sum; for(i=0, sum=0.0; i<100; i++) { a[i] = sqrt(a[i]); sum += a[i]; } printf("sum = %4.2f\n", sum); }

23 23 Assembly Reprsentation.section.bss, 8, 0x00000003, 0, 8.bss:.section.lit8, 1, 0x30000002, 8, 8.lit8:.section.rodata, 1, 0x00000002, 0, 8.rodata:.section.bss.origin 0x0.align 0.globl a.type a, stt_object.size a, 800

24 24 Assembly Reprsentation (contd.) a: # 0x0.dynsym a sto_default.space 800.section.text # Program Unit: main.ent main.globl main main: # 0x0.dynsym main sto_default.frame $sp, 16, $31.mask 0x80000000, -8 # gra_spill_temp_0 = 0 # gra_spill_temp_1 = 8.loc 1 4 8

25 25 Assembly Reprsentation (contd.) # 1 #include # 2 #include # 3 double a[100]; # 4 main() {.BB1.main: # 0x0.type main, stt_func lui $1, %hi(%neg(%gp_rel(main))) # [0] main addiu $sp, $sp, -16 # [0] addiu $1, $1, %lo(%neg(%gp_rel(main))) # [1] main sd $gp, 0($sp) # [1] gra_spill_temp_0 addu $gp, $25,$1 # [2] lw $5, %got_disp(a)($gp) # [3] a.loc 1 7 5

26 26 Some Interesting Notes Load instructions: the value will not be available in the destination register for use by the instruction following the load LOAD DELAY SLOT Control transfer instructions: the transfer of control takes place only following the instruction immediately after the control transfer instruction BRANCH DELAY SLOT

27 27 CISC vs RISC -- ISA Comparison RISC Code: lw R1, 0(R3) lw R2, 0(R4) add R5, R1, R2 subi R2, R2, 1 sw 0(R3), R5 sw0(R4), R2 CISC Code: add (R3)+, (R3), (R4) sub (R4), -(R4), 1 a[i++] = a[i] + b[i]; b[i] = b[--i] - 1;

28 28 MIPS Instruction Encoding Example: add R 1, R 2, R 3 Opcode 6 bits Src1 (rs) 5 bits Func. code 6-bits Dst (rd) 5 bits Src2 (rt) 5 bits R-Format sh amt 5 bits

29 29 MIPS Instruction Encoding Opcode 6 bits Src1 (rs) 5 bits constant 16-bits Dst (rt) 5 bits I-Format Example: addi R 1, R 2, 8 lw R 1, 24 (R 2) bltz R 1, loop

30 30 MIPS Instruction Encoding Opcode 6 bits Jump address 26-bits J-Format Example: jal fact

31 31 Some More Homework 1. Read B&O Chapter 2 2. B&O 2.48 3. B&O 2.50 4. B&O 2.55 5. B&O 2.62 6. Write a C program which checks whether the machine it is running on supports IEEE FP representation and arithmetic on NaN, denormals and infinity.

32 32 On Instruction Processing Fetch Get instruction whose address is in PC from memory into IR Increment PC Decode Understand instruction, addressing modes, etc Calculate effective addresses and fetch operands Execute Do required operation Write back the result of the instruction

33 33 Timeline of events PC to memory Instruction in IR PC++; Decode Op1 eff add calc Op1 fetched Op2 eff add calc Op2 fetched Op done Write result Processor/Memory Speed disparity ~2 orders of magnitude

34 34 Instruction Execution Mem IR + PC NPC 4 Instruction Fetch (IF) from program memory to instruction register IR Mem [PC] Increment PC Instr Fetch

35 35 Instruction Execution… Instr Fetch Reg File sign extend A Imm B Instr Decode Inst Mem IR + PC NPC 4 A RegisterFile[rs] B RegisterFile[rt] Imm sign extend(IR 15-0 ) Instruction Decode & Operand Fetch (ID)

36 36 Instruction Execution… Execution (EX) Arithmetic Inst: ALU-Out A op B ALU-Out A op Imm Load/Store Inst: ALU-Out A + Imm Branch Inst: ALU-Out NPC + Imm Jump Inst: PC NPC 31-28 || IR 25-0 ||00 Imm NPC ALU- out ALU Zero? B A Cond. Execution

37 37 Instruction Execution… Memory (MEM) Execution Memory Imm NPC ALU out ALU Zero? Mem LMD B A Cond Store Instr Mem[ALUOut] B Load Instr LMD Mem[ALUout]

38 38 Instruction Execution… Write Back (WB) ALU Inst RegisterFile[rd] ALUout Load Inst RegisterFile[rt] LMD Conditional Branch Inst PC ALU-out if Cond PC NPC otherwise

39 39 Processor Datapath Mem IR + PC NPC Reg File sign extend A I mm B Inst Fetch IF Inst Decode ID 4 ALU out ALU Zero? Mem LMD Execution EX Memory MEM Cond WB

40 40 Our Assumptions 1.Disparity in Processor vs Memory speed Time for performing addition, register access, etc. vs memory fetch? Which stages require memory access? 2.Main memory delays not typically seen by instruction processor Otherwise timeline is dominated by them There is some hardware mechanism through which most memory access requests can be satisfied at processor speeds (cache memory) 3.Preferable that the time required for each stage of instruction processing to be the same – cycle time

41 41 Processor cycle time: time required to do Cache memory access Register access + some logic (like decode) ALU operation Inst Fetch IF Inst Decode ID Execution EX Memory MEM Mem IR + PC NPC Reg File sign extend A I mm B 4 ALU out ALU Zero? Mem LMD Cond WriteBack WB

42 42 CISC vs RISC -- ISA Comparison RISC Code: lw R1, 0(R3) lw R2, 0(R4) add R5, R1, R2 subi R2, R2, 1 sw 0(R3), R5 sw0(R4), R2 CISC Code: add (R3)+, (R3), (R4) sub (R4), -(R4), 1 a[i++] = a[i] + b[i]; b[i] = b[--i] - 1; # of Data Memory Accesses: RISC - 4 CISC - 5

43 43 Performance of Processor Which is more important? execution time of a single instruction throughput of instruction execution i.e., number of instructions executed per unit time Cycles Per Instruction (CPI) Current ideas: CPI between 3 and 5

44 CPI Calculation Cycles for ALU Ins. – 4; Load – 5 ; Store – 4; Conditional – 4; Jump – 3; % of Instructions in a Program ALU Ins. – 45 %; Load – 15% ; Store – 10% ; Conditional – 20% ; Jump – 10%; CPI = ? CPI = 0.45*4 + 0.25*5 + 0.1*4 + 0.2*4 + 0.1*3 = 4.55 How to improve CPI? Pipelining : Fetch the next instruction while the previous is being decoded. 44

45 45 Assignments (so far) Find out details of a.out format for any C program. Write a C program to identify in which region the following types of variables are stored: (a) global (b) local; (c) static, and (d) dynamically allocated generate the ASCII code) Write a C program which checks whether the machine it is running on supports IEEE FP representation and arithmetic on NaN, denormals and infinity Write a C program to find the machine epsilon. Problem of your choice Deadline Sept. 16, 08

46 46 Assignments (so far – Aug. 31) Write a C program to Identify whether a machine supports Little Endian or BigEndian Write a C program to transfer a sequence of 4-byte values from a Little Endian to BigEndian. Write a simple C program and generate the corpg. assembly language program for MIPS architecture. Understand the instructions, function call mechanism, formats of branch and jump instructions, etc.


Download ppt "SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan"

Similar presentations


Ads by Google