Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture (I) Processor Architecture. – 2 – Processor Goal Understand basic computer organization Instruction set architecture Deeply explore the CPU.

Similar presentations


Presentation on theme: "Architecture (I) Processor Architecture. – 2 – Processor Goal Understand basic computer organization Instruction set architecture Deeply explore the CPU."— Presentation transcript:

1 Architecture (I) Processor Architecture

2 – 2 – Processor Goal Understand basic computer organization Instruction set architecture Deeply explore the CPU working mechanism How the instruction is executed: sequential and pipeline version Help you programming Fully understand how computer is organized and works will help you write more stable and efficient code.

3 – 3 – Processor CPU Design (Why?) It is interesting. Aid in understanding how the overall computer system works. Many design hardware systems containing processors. Maybe you will work on a processor design.

4 – 4 – Processor CPU Design Instruction set architecture Logic design Sequential implementation Pipelining and initial pipelined implementation Making the pipeline work Modern processor design

5 – 5 – Processor Suggested Reading - Chap 4.1, 4.2

6 – 6 – Processor Instruction Set Architecture #1 What is it ? Assemble Language Abstraction Machine Language Abstraction What does it provide? An abstraction of the real computer, hide the details of implementation The syntax of computer instructions The semantics of instructions The execution model Programmer-visible computer status Instruction Set Architecture (ISA)

7 – 7 – Processor Instruction Set Architecture #2 Assembly Language View Processor state Registers, memory, … Instructions addl, movl, leal, … How instructions are encoded as bytes Layer of Abstraction Above: how to program machine Processor executes instructions in a sequence Below: what needs to be built Use tricks to make it run fast E.g., execute multiple instructions simultaneously ISA CompilerOS CPU Design Circuit Design Chip Layout Application Program

8 – 8 – Processor Instruction Set Architecture #3 ISA define the processor family Two main kind: RISC and CISC RISC:SPARC, MIPS, PowerPC CISC:X86 (or called IA32) Another divide: Superscalar, VLIW and EPIC Superscalar: all the above VLIW: Philips TriMedia EPIC: IA64 Under same ISA, there are many different processors From different manufacturers: X86 from Intel and AMD and VIA Different models 8086, 80386, Pentium, Pentium 4

9 – 9 – Processor %eax %ecx %edx %ebx %esi %edi %esp %ebp Y86 Processor State Program Registers Same 8 as with IA32. Each 32 bits Condition Codes Single-bit flags set by arithmetic or logical instructions »OF: OverflowZF: ZeroSF:Negative Program Counter Indicates address of instruction Memory Byte-addressable storage array Words stored in little-endian byte order Program registers Condition codes PC Memory OFZFSF P258 Figure 4.1

10 – 10 – Processor Y86 Instructions Format (P259) 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state Errata: JXX and call are 5 bytes long. P259 Figure 4.2

11 – 11 – Processor Encoding Registers Each register has 4-bit ID Same encoding as in IA32, but IA32 using only 3-bit ID Register ID 8 indicates “no register” Will use this in our hardware design in multiple places %eax %ecx %edx %ebx %esi %edi %esp %ebp 0 1 2 3 6 7 4 5 P261 Figure 4.4

12 – 12 – Processor Instruction Example Addition Instruction Add value in register rA to that in register rB Store result in register rB Note that Y86 only allows addition to be applied to register data Set condition codes based on result e.g., addl %eax,%esi Encoding: 60 06 Two-byte encoding First indicates instruction type Second gives source and destination registers addl rA, rB 60 rArB Encoded Representation Generic Form P261 Figure 4.3

13 – 13 – Processor Arithmetic and Logical Operations Refer to generically as “ OPl ” Encodings differ only by “function code” Low-order 4 bytes in first instruction word Set condition codes as side effect Notice: no multiply or divide operation addl rA, rB 60 rArB subl rA, rB 61 rArB andl rA, rB 62 rArB xorl rA, rB 63 rArB Add Subtract (rA from rB) And Exclusive-Or Instruction CodeFunction Code

14 – 14 – Processor Move Operations Like the IA32 movl instruction Simpler format for memory addresses Give different names to keep them distinct rrmovl rA, rB 20 rArB Register --> Register Immediate --> Register irmovl V, rB 308 rB V Register --> Memory rmmovl rA, D ( rB) 40 rA rB D Memory --> Register mrmovl D ( rB), rA 50 rA rB D P259 Figure 4.2 Distinct: 清楚的

15 – 15 – Processor Move Instruction Examples irmovl $0xabcd, %edxmovl $0xabcd, %edx30 82 cd ab 00 00 IA32Y86Encoding rrmovl %esp, %ebxmovl %esp, %ebx20 43 mrmovl -12(%ebp),%ecxmovl -12(%ebp),%ecx50 15 f4 ff ff ff rmmovl %esi,0x41c(%esp)movl %esi,0x41c(%esp) — movl $0xabcd, (%eax) — movl %eax, 12(%eax,%edx) — movl (%ebp,%eax,4),%ecx 40 64 1c 04 00 00

16 – 16 – Processor Jump Instructions Refer to generically as “ jXX ” Encodings differ only by “function code” Based on values of condition codes Same as IA32 counterparts Encode full destination address Unlike PC-relative addressing seen in IA32 jmp Dest 70 Jump Unconditionally Dest jle Dest 71 Jump When Less or Equal Dest jl Dest 72 Jump When Less Dest je Dest 73 Jump When Equal Dest jne Dest 74 Jump When Not Equal Dest jge Dest 75 Jump When Greater or Equal Dest jg Dest 76 Jump When Greater Dest P259 Figure 4.2

17 – 17 – Processor Y86 Program Stack Region of memory holding program data Used in Y86 (and IA32) for supporting procedure calls Stack top indicated by %esp Address of top stack element Stack grows toward lower addresses Top element is at lowest address in the stack When pushing, must first decrement stack pointer When popping, increment stack pointer %esp Increasing Addresses Stack “Top” Stack “Bottom”

18 – 18 – Processor Stack Operations Decrement %esp by 4 Store word from rA to memory at %esp Like IA32 Read word from memory at %esp Save in rA Increment %esp by 4 Like IA32 pushl rA a0 rA 8 popl rA b0 rA 8 P259 Figure 4.2

19 – 19 – Processor Subroutine Call and Return Push address of next instruction onto stack Start executing instructions at Dest Like IA32 Pop value from stack Use as address for next instruction Like IA32 call Dest 80 Dest ret 90 P259 Figure 4.2

20 – 20 – Processor Miscellaneous Instructions Don’t do anything Stop executing instructions IA32 has comparable instruction, but can’t execute it in user mode We will use it to stop the simulator nop 00 halt 10 P259 Figure 4.2

21 – 21 – Processor Y86 Program Structure Program starts at address 0 Must set up stack Make sure don’t overwrite code! Must initialize data Can use symbolic names irmovl Stack,%esp# Set up stack rrmovl %esp,%ebp# Set up frame irmovl List,%edx pushl %edx# Push argument call len2# Call Function halt# Halt.align 4 List:# List of elements.long 5043.long 6125.long 7395.long 0 # Function len2:... # Allocate space for stack.pos 0x100 Stack:

22 – 22 – Processor Assembling Y86 Program Generates “object code” file eg.yo Actually looks like disassembler output unix> yas eg.ys 0x000: 308400010000 | irmovl Stack,%esp# Set up stack 0x006: 2045 | rrmovl %esp,%ebp # Set up frame 0x008: 308218000000 | irmovl List,%edx 0x00e: a028 | pushl %edx # Push argument 0x010: 8028000000 | call len2 # Call Function 0x015: 10 | halt # Halt 0x018: |.align 4 0x018: | List: # List of elements 0x018: b3130000 |.long 5043 0x01c: ed170000 |.long 6125 0x020: e31c0000 |.long 7395 0x024: 00000000 |.long 0

23 – 23 – Processor Simulating Y86 Program Instruction set simulator Computes effect of each instruction on processor state Prints changes in state from original unix> yis eg.yo Stopped in 41 steps at PC = 0x16. Exception 'HLT', CC Z=1 S=0 O=0 Changes to registers: %eax:0x000000000x00000003 %ecx:0x000000000x00000003 %edx:0x000000000x00000028 %esp:0x000000000x000000fc %ebp:0x000000000x00000100 %esi:0x000000000x00000004 Changes to memory: 0x00f4:0x000000000x00000100 0x00f8:0x000000000x00000015 0x00fc:0x000000000x00000018

24 – 24 – Processor Summary Y86 Instruction Set Architecture Similar state and instructions as IA32 Simpler encodings Somewhere between CISC and RISC How Important is ISA Design? Less now than before With enough hardware, can make almost anything go fast Intel is moving away from IA32 Does not allow enough parallel execution Introduced IA64 »64-bit word sizes (overcome address space limitations) »Radically different style of instruction set with explicit parallelism »Requires sophisticated compilers Radically: 根本上

25 – 25 – Processor Logic Design Digital circuit What is digital circuit? Know what a CPU will base on? Hardware Control Language (HCL) A simple and functional language to describe our CPU implementation Syntax like C

26 – 26 – Processor Category of Circuit Analog Circuit Use all the range of Signal Most part is amplifier Hard to model and automatic design Use transistor and capacitance as basis We will not discuss it here Digital Circuit Has only two values, 0 and 1 Easy to model and design Use true table and other tools to analyze Use gate as the basis The voltage of 1 is differ in different kind circuit. E.g. TTL circuit using 5 voltage as 1 Amplifier: 放大器 Transistor :晶体管 Capacitance: 电容 Amplifier: 放大器 Transistor :晶体管 Capacitance: 电容

27 – 27 – Processor Digital Signals Use voltage thresholds to extract discrete values from continuous signal Simplest version: 1-bit signal Either high range (1) or low range (0) With guard range between them Not strongly affected by noise or low quality circuit elements Can make circuits simple, small, and fast Voltage Time 0 1 0

28 – 28 – Processor Overview of Logic Design Fundamental Hardware Requirements Communication How to get values from one place to another Computation Storage (Memory) Clock Signal Bits are Our Friends Everything expressed in terms of values 0 and 1 Communication Low or high voltage on wire Computation Compute Boolean functions Storage Clock Signal

29 – 29 – Processor Category of Digital Circuit Combinational Circuit Without memory. So the circuit can’t have state. Any same input will get the same output at any time. Needn’t clock signal Typical application: ALU Sequential Circuit = Combinational circuit + memory and clock signal Have state. Two same inputs may not generate the same output. Use clock signal to control the run of circuit. Typical application: CPU

30 – 30 – Processor Computing with Logic Gates P272 Figure 4.8 Outputs are Boolean functions of inputs Not an assignment operation, just give the circuit a name Respond continuously to changes in inputs With some, small delay Voltage Time a b a && b Rising Delay Falling Delay

31 – 31 – Processor Combinational Circuits Acyclic Network of Logic Gates Continuously responds to changes on inputs Outputs become (after some delay) Boolean functions of inputs Acyclic Network InputsOutputs

32 – 32 – Processor Bit Equality P272 Figure 4.9 Generate 1 if a and b are equal Hardware Control Language (HCL) Very simple hardware description language Boolean operations have syntax similar to C logical operations We’ll use it to describe control logic for processors Bit equal a b eq bool eq = (a&&b)||(!a&&!b) HCL Expression

33 – 33 – Processor Bit-Level Multiplexor P273 Figure 4.10 Control signal s Data signals a and b Output a when s=1, b when s=0 Its name: MUX Usage: Select one signal from a couple of signals Bit MUX b s a out bool out = (s&&a)||(!s&&b) HCL Expression

34 – 34 – Processor Word Equality P274 Figure 4.11 32-bit word size HCL representation Equality operation Generates Boolean value b 31 Bit equal a 31 eq 31 b 30 Bit equal a 30 eq 30 b1b1 Bit equal a1a1 eq 1 b0b0 Bit equal a0a0 eq 0 Eq = = B A Word-Level Representation bool Eq = (A == B) HCL Representation

35 – 35 – Processor Word Multiplexor P275 Figure 4.12 Select input word A or B depending on control signal s HCL representation Case expression Series of test : value pairs Output value for first successful test Word-Level Representation HCL Representation b 31 s a 31 out 31 b 30 a 30 out 30 b0b0 a0a0 out 0 int Out = [ s : A; 1 : B; ]; s B A Out MUX

36 – 36 – Processor HCL Word-Level Examples P277, P266 Find minimum of three input words HCL case expression Final case guarantees match A Min3 MIN3 B C int Min3 = [ A < B && A < C : A; B < A && B < C : B; 1 : C; ]; D0 D3 Out4 s0 s1 MUX4 D2 D1 Select one of 4 inputs based on two control bits HCL case expression Simplify tests by assuming sequential matching int Out4 = [ !s1&&!s0: D0; !s1 : D1; !s0 : D2; 1 : D3; ]; Minimum of 3 Words 4-Way Multiplexor

37 – 37 – Processor OF ZF SFSF OF ZF SFSF OF ZF SFSF OF ZF SFSF Arithmetic Logic Unit P278 Figure 4.13 Combinational logic Continuously responding to inputs Control signal selects function computed Corresponding to 4 arithmetic/logical operations in Y86 Also computes values for condition codes We will use it as a basic component for our CPU ALUALU Y X X + Y 0 ALUALU Y X X - Y 1 ALUALU Y X X & Y 2 ALUALU Y X X ^ Y 3 A B A B A B A B

38 – 38 – Processor Storage Registers Hold single words or bits Loaded as clock rises Not only program registers Random-access memories Hold multiple words E.g. register file, memory Possible multiple read or write ports Read word when address input changes Write word as clock rises IO Clock

39 – 39 – Processor Register Operation P279 Figure 4.14 Stores data bits For most of time acts as barrier between input and output As clock rises, loads input State = x Rising clock  Output = xInput = y x  State = y Output = y y Barrier: 屏障

40 – 40 – Processor State Machine Example Accumulator circuit Load or accumulate on each cycle Comb. Logic ALUALU 0 Out MUX 0 1 Clock In Load x0x0 x1x1 x2x2 x3x3 x4x4 x5x5 x0x0 x 0 +x 1 x 0 +x 1 +x 2 x3x3 x 3 +x 4 x 3 +x 4 +x 5 Clock Load In Out

41 – 41 – Processor Random-Access Memory P280 Stores multiple words of memory Address input specifies which word to read or write Register file Holds values of program registers %eax, %esp, etc. Register identifier serves as address »ID 8 implies no read or write performed Multiple Ports Can read and/or write multiple words in one cycle »Each has separate address and data input/output Register file Register file A B W dstW srcA valA srcB valB valW Read portsWrite port Clock

42 – 42 – Processor Register File Timing Reading Like combinational logic Output data generated based on input address After some delayWriting Like register Update only as clock rises Register file Register file A B srcA valA srcB valB y 2 Register file Register file W dstW valW Clock x 2 Rising clock   Register file Register file W dstW valW Clock y 2 x 2 x 2

43 – 43 – Processor Hardware Control Language Very simple hardware description language Can only express limited aspects of hardware operation Parts we want to explore and modify Data Types bool : Boolean a, b, c, … int : words A, B, C, … Does not specify word size---bytes, 32-bit words, …Statements bool a = bool-expr ; int A = int-expr ;

44 – 44 – Processor HCL Operations Classify by type of value returned Boolean Expressions Logic Operations a && b, a || b, !a Word Comparisons A == B, A != B, A = B, A > B Set Membership A in { B, C, D } »Same as A == B || A == C || A == D Word Expressions Case expressions [ a : A; b : B; c : C ] Evaluate test expressions a, b, c, … in sequence Return word expression A, B, C, … for first successful test

45 – 45 – Processor Summary Computation Performed by combinational logic Computes Boolean functions Continuously reacts to input changesStorage Registers Hold single words Loaded as clock rises Random-access memories Hold multiple words Possible multiple read or write ports Read word when address input changes Write word as clock rises


Download ppt "Architecture (I) Processor Architecture. – 2 – Processor Goal Understand basic computer organization Instruction set architecture Deeply explore the CPU."

Similar presentations


Ads by Google