SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan

Slides:



Advertisements
Similar presentations
RAM (cont.) 220 bytes of RAM (1 Mega-byte) 20 bits of address Address
Advertisements

Chapter 2: Data Manipulation
Computer Science Education
CH10 Instruction Sets: Characteristics and Functions
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Chapter 4 The Von Neumann Model
Chapter 4 The Von Neumann Model
Execution Cycle. Outline (Brief) Review of MIPS Microarchitecture Execution Cycle Pipelining Big vs. Little Endian-ness CPU Execution Time 1 IF ID EX.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Registers of the 8086/ /2002 JNM.
Instruction execution and sequencing
ARM versions ARM architecture has been extended over several versions.
Appendix D The ARM Processor
Overheads for Computers as Components 2nd ed.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
Chapter 3 โพรเซสเซอร์และการทำงาน The Processing Unit
MIPS Assembly Tutorial
Digital System Design Using Verilog
Instruction Set Design
Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.
CEG3420 Lec2.1 ©UCB Fall 1997 ISA Review CEG3420 Computer Design Lecture 2.
Chapter 2 Data Manipulation Dr. Farzana Rahman Assistant Professor Department of Computer Science James Madison University 1 Some sldes are adapted from.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
INSTRUCTION SET ARCHITECTURES
ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson slides3.ppt Modification date: March 16, Addressing Modes The methods used in machine instructions.
Execution of an instruction
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
What is an instruction set?
Lecture 18 Last Lecture Today’s Topic Instruction formats
Some material taken from Assembly Language for x86 Processors by Kip Irvine © Pearson Education, 2010 Slides revised 2/2/2014 by Patrick Kelley.
Machine Instruction Characteristics
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.
Homework Problems 1. M1 runs the program P in 1.4 * 9 * ns or ns M2 runs the program P in 1.6*9800*10ns or ns Hence M2 is faster by.
Instruction Set Architecture The portion of the machine visible to the programmer Issues: Internal storage model Addressing modes Operations Operands Encoding.
Execution of an instruction
Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,
Computer Architecture and Organization
Computer Architecture EKT 422
Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.
COMPUTER ORGANIZATION AND ASSEMBLY LANGUAGE Lecture 21 & 22 Processor Organization Register Organization Course Instructor: Engr. Aisha Danish.
What is a program? A sequence of steps
Group # 3 Jorge Chavez Henry Diaz Janty Ghazi German Montenegro.
Ass. Prof. Dr Masri Ayob TK 2123 Lecture 14: Instruction Set Architecture Level (Level 2)
Computer Organization Instructions Language of The Computer (MIPS) 2.
Instruction Sets: Characteristics and Functions  Software and Hardware interface Machine Instruction Characteristics Types of Operands Types of Operations.
CSC 221 Computer Organization and Assembly Language Lecture 06: Machine Instruction Characteristics.
Computer Architecture. Instruction Set “The collection of different instructions that the processor can execute it”. Usually represented by assembly codes,
A Closer Look at Instruction Set Architectures
Microcomputer Programming
Computer Organization and Assembly Language (COAL)
ECEG-3202 Computer Architecture and Organization
Chapter 9 Instruction Sets: Characteristics and Functions
MIPS History MIPS is a computer family
ECEG-3202 Computer Architecture and Organization
UCSD ECE 111 Prof. Farinaz Koushanfar Fall 2018
Introduction to Microprocessor Programming
Instruction Set Principles
COMS 361 Computer Organization
CPU Structure CPU must:
COMPUTER ORGANIZATION AND ARCHITECTURE
Chapter 10 Instruction Sets: Characteristics and Functions
Presentation transcript:

SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan

2 Basic Computer Organization Main parts of a computer system: Processor: Executes programs Main memory: Holds program and data I/O devices: For communication with outside Machine instruction: Description of primitive operation that machine hardware is able to execute Instruction Set: Complete specification of all the kinds of instructions that the processor hardware was built to execute e.g. ADD these two integers

3 Basic Computer Organization Memory I/O Bus I/O ALURegisters CPU Control

4 Inside the Processor… Hardware to manage instruction execution Arithmetic, logic hardware Registers: small units of memory to hold data/instructions temporarily during execution Two kinds of registers 1. Special purpose registers 2. General purpose registers

5 Special Purpose Registers Program Counter (PC): specifies location in memory of instruction being executed Instruction Register (IR): holds that instruction Processor Status Register: holds status information about current state of processor, such as whether an arithmetic overflow has occurred, etc

6 General Purpose Registers Available for use by programmer, possibly for keeping frequently used data Why? Since there is a large speed disparity between processor and main memory 1 GHz Processor: 1 nanosecond time scale Memory: ~ nsec time scale What do these numbers mean? Instruction operands can come from registers or from main memory

7 Basic Computer Organization Cache Memory I/O Bus I/O MMU ALURegisters CPU Control General Purpose Registers Integer Registers FP Registers Special Purpose Registers Program Counter Instruction Register

8 Main Memory Holds instructions and data View as sequence of locations, each referred to by a unique memory address If size of each memory location is 1 Byte, we call the memory byte addressable This is quite typical, as smallest data (character) is represented in 1 Byte Larger data items are stored in contiguous memory locations, e.g., a 4Byte integer would occupy 4 consecutive memory locations

9 Terms: Byte ordering What is the integer (4 byte data) at Address 400? Big Endian byte ordering:1AC8B246 Little Endian byte ordering: 46B2C81A Some machines use big endian byte ordering and others use little endian byte ordering 1AC846B2F08CDF1E Data Address In Hexadecimal (0,1,2,…,A,B,C,D,E,F) Decimal: 449,360,454 Decimal: 1,186,121,754

10 Terms: Word Size, Word Alignment Word Size Normal size of an integer or pointer 32b (4B) on many machines Word Alignment `Integer variable X is not word aligned The data item is not located at a word boundary Word boundaries: addresses 0, 4, 8, 12, … HW: Write a C program to Identify whether a machine supports Little Endian or BigEndian Write a C program to transfer a sequence of 4-byte values from a Little Endian to BigEndian.

11 Instruction Set Architecture (ISA) View of the computer visible to the programmer (or compiler) Two kinds of ISAs 1. Complex Instruction Set Computer (CISC) A single instruction can perform a complex operation involving several actions 2. Reduced Instruction Set Computer (RISC) Each instruction performs a only simple operation

12 Instruction Set Architecture Description of machine from view of the programmer/compiler Example: Intel x86 ISA Includes specification of 1. The different kinds of instructions available (instruction set) 2. How operands are specified (addressing modes) 3. What each instruction looks like (instruction format)

13 Kinds of Instructions 1.Arithmetic/logical instructions Add, subtract, multiply, divide, compare (int/fp) Or, and, not, xor Shift (left/right, arithmetic/logical), rotate 2.Data transfer instructions Load (to register from memory) Store (to memory location from register) Move 3.Control transfer instructions Jump, conditional branch, function call, return 4.Other instructions Example: halt

14 Operand Addressing Modes Operands to an instruction Source: input value to instruction Destination: where result is to go Addressing Mode How the location of operand is specified An operand can be either in a memory location in a register

15 Addressing Modes: Operand in Register 1. Register Direct Addressing Mode Operand is in the specified general purpose register Example Suppose that the General Purpose Registers are numbered as 0, 1, 2, etc ADD R1, R2, R3/ R1 R2 + R3 2. Immediate Addressing Mode Operand is included in the instruction ADD R1, R2, 1/ R1 R2 + 1 R1 R2 R source operands destination operand

16 Addressing Modes: Operand in Memory 3. Register Indirect Addressing Mode Memory address of operand is in the specified general purpose register ADD R1, R1, (R2) 4. Base-Displacement Addressing Mode Memory address of operand is calculated as the sum of value in specified register and specified displacement ADD R1, R1, 4(R2) R1 R Address Value MAIN MEMORY Address Value MAIN MEMORY 42 67

17 Addressing Modes: Operand in Memory 5. Absolute Addressing Mode Memory address of operand is specified directly in the instruction ADD R1, R2, # Indexed Addressing Mode Memory address of operand is calculated as sum of contents of 2 registers ADD R1, R2, (R3+R4) Others Auto-increment/decrement (pre/post) PC relative

18 Case Study: MIPS I Integer Instruction Set Registers 32 32b general purpose registers, R0..R31 R0 hardwired to value 0 R31 implicitly used by instructions JAL, JALR HI, LO: 2 other 32b registers Used implicitly by multiply and divide instructions Addressing Modes Immediate, Register direct (arithmetic) Absolute (jumps) Base-displacement (loads, stores) PC relative (branches)

19 MIPS I ISA: General Comments All instructions, registers are 32b in size Load-store architecture: the only instructions that have memory operands are loads&stores Terminology Word: 32b Halfword: 16b Byte: 8b Displacements and immediates are signed 16 bit quantities

20 A RISC Instruction Set

21 RISC Instruction Set (contd) HW: Write a simple C program and generate the corpg. assembly language program for MIPS architecture. Understand the instructions, function call mechanism, formats of branch and jump instructions, etc.

22 An Example Program #include double a[100]; main() { int i; double sum; for(i=0, sum=0.0; i<100; i++) { a[i] = sqrt(a[i]); sum += a[i]; } printf("sum = %4.2f\n", sum); }

23 Assembly Reprsentation.section.bss, 8, 0x , 0, 8.bss:.section.lit8, 1, 0x , 8, 8.lit8:.section.rodata, 1, 0x , 0, 8.rodata:.section.bss.origin 0x0.align 0.globl a.type a, stt_object.size a, 800

24 Assembly Reprsentation (contd.) a: # 0x0.dynsym a sto_default.space 800.section.text # Program Unit: main.ent main.globl main main: # 0x0.dynsym main sto_default.frame $sp, 16, $31.mask 0x , -8 # gra_spill_temp_0 = 0 # gra_spill_temp_1 = 8.loc 1 4 8

25 Assembly Reprsentation (contd.) # 1 #include # 2 #include # 3 double a[100]; # 4 main() {.BB1.main: # 0x0.type main, stt_func lui $1, %hi(%neg(%gp_rel(main))) # [0] main addiu $sp, $sp, -16 # [0] addiu $1, $1, %lo(%neg(%gp_rel(main))) # [1] main sd $gp, 0($sp) # [1] gra_spill_temp_0 addu $gp, $25,$1 # [2] lw $5, %got_disp(a)($gp) # [3] a.loc 1 7 5

26 Some Interesting Notes Load instructions: the value will not be available in the destination register for use by the instruction following the load LOAD DELAY SLOT Control transfer instructions: the transfer of control takes place only following the instruction immediately after the control transfer instruction BRANCH DELAY SLOT

27 CISC vs RISC -- ISA Comparison RISC Code: lw R1, 0(R3) lw R2, 0(R4) add R5, R1, R2 subi R2, R2, 1 sw 0(R3), R5 sw0(R4), R2 CISC Code: add (R3)+, (R3), (R4) sub (R4), -(R4), 1 a[i++] = a[i] + b[i]; b[i] = b[--i] - 1;

28 MIPS Instruction Encoding Example: add R 1, R 2, R 3 Opcode 6 bits Src1 (rs) 5 bits Func. code 6-bits Dst (rd) 5 bits Src2 (rt) 5 bits R-Format sh amt 5 bits

29 MIPS Instruction Encoding Opcode 6 bits Src1 (rs) 5 bits constant 16-bits Dst (rt) 5 bits I-Format Example: addi R 1, R 2, 8 lw R 1, 24 (R 2) bltz R 1, loop

30 MIPS Instruction Encoding Opcode 6 bits Jump address 26-bits J-Format Example: jal fact

31 Some More Homework 1. Read B&O Chapter 2 2. B&O B&O B&O B&O Write a C program which checks whether the machine it is running on supports IEEE FP representation and arithmetic on NaN, denormals and infinity.

32 On Instruction Processing Fetch Get instruction whose address is in PC from memory into IR Increment PC Decode Understand instruction, addressing modes, etc Calculate effective addresses and fetch operands Execute Do required operation Write back the result of the instruction

33 Timeline of events PC to memory Instruction in IR PC++; Decode Op1 eff add calc Op1 fetched Op2 eff add calc Op2 fetched Op done Write result Processor/Memory Speed disparity ~2 orders of magnitude

34 Instruction Execution Mem IR + PC NPC 4 Instruction Fetch (IF) from program memory to instruction register IR Mem [PC] Increment PC Instr Fetch

35 Instruction Execution… Instr Fetch Reg File sign extend A Imm B Instr Decode Inst Mem IR + PC NPC 4 A RegisterFile[rs] B RegisterFile[rt] Imm sign extend(IR 15-0 ) Instruction Decode & Operand Fetch (ID)

36 Instruction Execution… Execution (EX) Arithmetic Inst: ALU-Out A op B ALU-Out A op Imm Load/Store Inst: ALU-Out A + Imm Branch Inst: ALU-Out NPC + Imm Jump Inst: PC NPC || IR 25-0 ||00 Imm NPC ALU- out ALU Zero? B A Cond. Execution

37 Instruction Execution… Memory (MEM) Execution Memory Imm NPC ALU out ALU Zero? Mem LMD B A Cond Store Instr Mem[ALUOut] B Load Instr LMD Mem[ALUout]

38 Instruction Execution… Write Back (WB) ALU Inst RegisterFile[rd] ALUout Load Inst RegisterFile[rt] LMD Conditional Branch Inst PC ALU-out if Cond PC NPC otherwise

39 Processor Datapath Mem IR + PC NPC Reg File sign extend A I mm B Inst Fetch IF Inst Decode ID 4 ALU out ALU Zero? Mem LMD Execution EX Memory MEM Cond WB

40 Our Assumptions 1.Disparity in Processor vs Memory speed Time for performing addition, register access, etc. vs memory fetch? Which stages require memory access? 2.Main memory delays not typically seen by instruction processor Otherwise timeline is dominated by them There is some hardware mechanism through which most memory access requests can be satisfied at processor speeds (cache memory) 3.Preferable that the time required for each stage of instruction processing to be the same – cycle time

41 Processor cycle time: time required to do Cache memory access Register access + some logic (like decode) ALU operation Inst Fetch IF Inst Decode ID Execution EX Memory MEM Mem IR + PC NPC Reg File sign extend A I mm B 4 ALU out ALU Zero? Mem LMD Cond WriteBack WB

42 CISC vs RISC -- ISA Comparison RISC Code: lw R1, 0(R3) lw R2, 0(R4) add R5, R1, R2 subi R2, R2, 1 sw 0(R3), R5 sw0(R4), R2 CISC Code: add (R3)+, (R3), (R4) sub (R4), -(R4), 1 a[i++] = a[i] + b[i]; b[i] = b[--i] - 1; # of Data Memory Accesses: RISC - 4 CISC - 5

43 Performance of Processor Which is more important? execution time of a single instruction throughput of instruction execution i.e., number of instructions executed per unit time Cycles Per Instruction (CPI) Current ideas: CPI between 3 and 5

CPI Calculation Cycles for ALU Ins. – 4; Load – 5 ; Store – 4; Conditional – 4; Jump – 3; % of Instructions in a Program ALU Ins. – 45 %; Load – 15% ; Store – 10% ; Conditional – 20% ; Jump – 10%; CPI = ? CPI = 0.45* * * * *3 = 4.55 How to improve CPI? Pipelining : Fetch the next instruction while the previous is being decoded. 44

45 Assignments (so far) Find out details of a.out format for any C program. Write a C program to identify in which region the following types of variables are stored: (a) global (b) local; (c) static, and (d) dynamically allocated generate the ASCII code) Write a C program which checks whether the machine it is running on supports IEEE FP representation and arithmetic on NaN, denormals and infinity Write a C program to find the machine epsilon. Problem of your choice Deadline Sept. 16, 08

46 Assignments (so far – Aug. 31) Write a C program to Identify whether a machine supports Little Endian or BigEndian Write a C program to transfer a sequence of 4-byte values from a Little Endian to BigEndian. Write a simple C program and generate the corpg. assembly language program for MIPS architecture. Understand the instructions, function call mechanism, formats of branch and jump instructions, etc.