Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2000 Morgan Kaufman Overheads for Computers as Components CPUs zInput and output. zSupervisor mode, exceptions, traps. zCo-processors.

Similar presentations


Presentation on theme: "© 2000 Morgan Kaufman Overheads for Computers as Components CPUs zInput and output. zSupervisor mode, exceptions, traps. zCo-processors."— Presentation transcript:

1

2 © 2000 Morgan Kaufman Overheads for Computers as Components CPUs zInput and output. zSupervisor mode, exceptions, traps. zCo-processors.

3 © 2000 Morgan Kaufman Overheads for Computers as Components I/O devices zUsually includes some non-digital component. zTypical digital interface to CPU: CPU status reg data reg mechanism

4 © 2000 Morgan Kaufman Overheads for Computers as Components Application: 8251 UART zUniversal asynchronous receiver transmitter (UART) : provides serial communication. z8251 functions are integrated into standard PC interface chip. zAllows many communication parameters to be programmed.

5 © 2000 Morgan Kaufman Overheads for Computers as Components Serial communication zCharacters are transmitted separately: time bit 0bit 1bit n-1 no char start stop...

6 © 2000 Morgan Kaufman Overheads for Computers as Components Serial communication parameters zBaud (bit) rate. zNumber of bits per character. zParity/no parity. zEven/odd parity. zLength of stop bit (1, 1.5, 2 bits).

7 © 2000 Morgan Kaufman Overheads for Computers as Components 8251 CPU interface CPU 8251 status (8 bit) data (8 bit) serial port xmit/ rcv

8 © 2000 Morgan Kaufman Overheads for Computers as Components Programming I/O zTwo types of instructions can support I/O: yspecial-purpose I/O instructions; ymemory-mapped load/store instructions.  Intel x86 provides in, out instructions. Most other CPUs use memory-mapped I/O. zI/O instructions do not preclude memory- mapped I/O.

9 © 2000 Morgan Kaufman Overheads for Computers as Components ARM memory-mapped I/O zDefine location for device: DEV1 EQU 0x1000 zRead/write code: LDR r1,#DEV1 ; set up device adrs LDR r0,[r1] ; read DEV1 LDR r0,#8 ; set up value to write STR r0,[r1] ; write value to device

10 © 2000 Morgan Kaufman Overheads for Computers as Components SHARC memory mapped I/O zDevice must be in external memory space (above 0x400000). zUse DM to control access: I0 = 0x400000; M0 = 0; R1 = DM(I0,M0);

11 © 2000 Morgan Kaufman Overheads for Computers as Components Interrupt I/O zBusy/wait is very inefficient. yCPU can’t do other work while testing device. yHard to do simultaneous I/O. zInterrupts allow a device to change the flow of control in the CPU. yCauses subroutine call to handle device.

12 © 2000 Morgan Kaufman Overheads for Computers as Components Interrupt interface CPU status reg data reg mechanism PC intr request intr ack data/address IR

13 © 2000 Morgan Kaufman Overheads for Computers as Components Interrupt behavior zBased on subroutine call mechanism. zInterrupt forces next instruction to be a subroutine call to a predetermined location. yReturn address is saved to resume executing foreground program.

14 © 2000 Morgan Kaufman Overheads for Computers as Components Interrupt physical interface zCPU and device are connected by CPU bus. zCPU and device handshake: ydevice asserts interrupt request; yCPU asserts interrupt acknowledge when it can handle the interrupt.

15 © 2000 Morgan Kaufman Overheads for Computers as Components Example: interrupt I/O with buffers zQueue for characters: headtail headtail a

16 © 2000 Morgan Kaufman Overheads for Computers as Components I/O sequence diagram :foreground:input:output:queue empty a b bc c

17 © 2000 Morgan Kaufman Overheads for Computers as Components Debugging interrupt code zWhat if you forget to change registers? yForeground program can exhibit mysterious bugs. yBugs will be hard to repeat---depend on interrupt timing.

18 © 2000 Morgan Kaufman Overheads for Computers as Components Priorities and vectors zTwo mechanisms allow us to make interrupts more specific: yPriorities determine what interrupt gets CPU first. yVectors determine what code is called for each type of interrupt. zMechanisms are orthogonal: most CPUs provide both.

19 © 2000 Morgan Kaufman Overheads for Computers as Components Prioritized interrupts CPU device 1device 2device n L1 L2.. Ln interrupt acknowledge

20 © 2000 Morgan Kaufman Overheads for Computers as Components Interrupt prioritization zMasking: interrupt with priority lower than current priority is not recognized until pending interrupt is complete. zNon-maskable interrupt (NMI): highest- priority, never masked. yOften used for power-down.

21 © 2000 Morgan Kaufman Overheads for Computers as Components Example: Prioritized I/O :interrupts:foreground:A:A:B:B:C:C B A,B C A

22 © 2000 Morgan Kaufman Overheads for Computers as Components Interrupt vectors zAllow different devices to be handled by different code. zInterrupt vector table: handler 0 handler 1 handler 2 handler 3 Interrupt vector table head

23 © 2000 Morgan Kaufman Overheads for Computers as Components Interrupt vector acquisition :CPU:device receive request receive ack receive vector

24 © 2000 Morgan Kaufman Overheads for Computers as Components Generic interrupt mechanism intr? N Y Assume priority selection is handled before this point. N ignore Y ack vector? Y Y N timeout? Y bus error call table[vector] intr priority > current priority? continue execution

25 © 2000 Morgan Kaufman Overheads for Computers as Components Interrupt sequence zCPU acknowledges request. zDevice sends vector. zCPU calls handler. zSoftware processes request. zCPU restores state to foreground program.

26 © 2000 Morgan Kaufman Overheads for Computers as Components Sources of interrupt overhead zHandler execution time. zInterrupt mechanism overhead. zRegister save/restore. zPipeline-related penalties. zCache-related penalties.

27 © 2000 Morgan Kaufman Overheads for Computers as Components ARM interrupts zARM7 supports two types of interrupts: yFast interrupt requests (FIQs). yInterrupt requests (IRQs). zInterrupt table starts at location 0.

28 © 2000 Morgan Kaufman Overheads for Computers as Components ARM interrupt procedure zCPU actions: ySave PC. Copy CPSR to SPSR. yForce bits in CPSR to record interrupt. yForce PC to vector. zHandler responsibilities: yRestore proper PC. yRestore CPSR from SPSR. yClear interrupt disable flags.

29 © 2000 Morgan Kaufman Overheads for Computers as Components ARM interrupt latency zWorst-case latency to respond to interrupt is 27 cycles: yTwo cycles to synchronize external request. yUp to 20 cycles to complete current instruction. yThree cycles for data abort. yTwo cycles to enter interrupt handling state.

30 © 2000 Morgan Kaufman Overheads for Computers as Components SHARC interrupt structure zInterrupts are vectored and prioritized. zPriorities are fixed: reset highest, user SW interrupt 3 lowest. zVectors are also fixed. Vector is offset in vector table. Table starts at 0x20000 in internal memory, 0x40000 in external memory.v

31 © 2000 Morgan Kaufman Overheads for Computers as Components SHARC interrupt sequence Start: must be executing or IDLE/IDLE16. 1. Output appropriate interrupt vector address. 2. Push PC value onto PC stack. 3. Set bit in interrupt latch register. 4. Set IMASKP to current nesting state.

32 © 2000 Morgan Kaufman Overheads for Computers as Components SHARC interrupt return Initiated by RTI instruction. 1. Return to address at top of PC stack. 2. Pop PC stack. 3. Pop status stack if appropriate. 4. Clear bits in interrupt latch register and IMASKP.

33 © 2000 Morgan Kaufman Overheads for Computers as Components SHARC interrupt performance Three stages of response: y1 cycle: synchronization and latching; y1 cycle: recognition; y2 cycles: brancing to vector. Total latency: 3 cycles. Multiprocessor vector interrupts have 6 cycle latency.

34 © 2000 Morgan Kaufman Overheads for Computers as Components Supervisor mode zMay want to provide protective barriers between programs. yAvoid memory corruption. zNeed supervisor mode to manage the various programs. zSHARC does not have a supervisor mode.

35 © 2000 Morgan Kaufman Overheads for Computers as Components ARM supervisor mode zUse SWI instruction to enter supervisor mode, similar to subroutine: SWI CODE_1 zSets PC to 0x08. zArgument to SWI is passed to supervisor mode code. zSaves CPSR in SPSR.

36 © 2000 Morgan Kaufman Overheads for Computers as Components Exception zException: internally detected error. zExceptions are synchronous with instructions but unpredictable. zBuild exception mechanism on top of interrupt mechanism. zExceptions are usually prioritized and vectorized.

37 © 2000 Morgan Kaufman Overheads for Computers as Components Trap zTrap (software interrupt): an exception generated by an instruction. yCall supervisor mode. zARM uses SWI instruction for traps. zSHARC offers three levels of software interrupts. yCalled by setting bits in IRPTL register.

38 © 2000 Morgan Kaufman Overheads for Computers as Components Co-processor zCo-processor: added function unit that is called by instruction. yFloating-point units are often structured as co-processors. zARM allows up to 16 designer-selected co- processors. yFloating-point co-processor uses units 1 and 2.

39 © 2000 Morgan Kaufman Overheads for Computers as Components CPUs zCaches. zMemory management.

40 © 2000 Morgan Kaufman Overheads for Computers as Components Caches and CPUs CPU cache controller cache main memory data address data address

41 © 2000 Morgan Kaufman Overheads for Computers as Components Cache operation zMany main memory locations are mapped onto one cache entry. zMay have caches for: yinstructions; ydata; ydata + instructions (unified). zMemory access time is no longer deterministic.

42 © 2000 Morgan Kaufman Overheads for Computers as Components Terms zCache hit: required location is in cache. zCache miss: required location is not in cache. zWorking set: set of locations used by program in a time interval.

43 © 2000 Morgan Kaufman Overheads for Computers as Components Types of misses zCompulsory (cold): location has never been accessed. zCapacity: working set is too large. zConflict: multiple locations in working set map to same cache entry.

44 © 2000 Morgan Kaufman Overheads for Computers as Components Memory system performance zh = cache hit rate. zt cache = cache access time, t main = main memory access time. zAverage memory access time: yt av = ht cache + (1-h)t main

45 © 2000 Morgan Kaufman Overheads for Computers as Components Multiple levels of cache CPU L1 cache L2 cache

46 © 2000 Morgan Kaufman Overheads for Computers as Components Multi-level cache access time zh 1 = cache hit rate. zh 2 = rate for miss on L1, hit on L2. zAverage memory access time: yt av = h 1 t L1 + (1-h 1 ) h 2 t L2 + (1-h 1 )(1-h 2 )t main

47 © 2000 Morgan Kaufman Overheads for Computers as Components Replacement policies zReplacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location. zTwo popular strategies: yRandom. yLeast-recently used (LRU).

48 © 2000 Morgan Kaufman Overheads for Computers as Components Cache organizations zFully-associative: any memory location can be stored anywhere in the cache (almost never implemented). zDirect-mapped: each memory location maps onto exactly one cache entry. zN-way set-associative: each memory location can go into one of n sets.

49 © 2000 Morgan Kaufman Overheads for Computers as Components Cache performance benefits zKeep frequently-accessed locations in fast cache. zCache retrieves more than one word at a time. ySequential accesses are faster after first access.

50 © 2000 Morgan Kaufman Overheads for Computers as Components Direct-mapped cache valid = tagindexoffset hit value tagdata 10xabcdbyte byte byte... byte cache block

51 © 2000 Morgan Kaufman Overheads for Computers as Components Write operations zWrite-through: immediately copy write to main memory. zWrite-back: write to main memory only when location is removed from cache.

52 © 2000 Morgan Kaufman Overheads for Computers as Components Direct-mapped cache locations zMany locations map onto the same cache block. zConflict misses are easy to generate: yArray a[] uses locations 0, 1, 2, … yArray b[] uses locations 1024, 1025, 1026, … yOperation a[i] + b[i] generates conflict misses.

53 © 2000 Morgan Kaufman Overheads for Computers as Components Set-associative cache zA set of direct-mapped caches: Set 1Set 2Set n... hit data

54 © 2000 Morgan Kaufman Overheads for Computers as Components Example: direct-mapped vs. set-associative

55 © 2000 Morgan Kaufman Overheads for Computers as Components Direct-mapped cache behavior zAfter 001 access: blocktagdata 00-- 0101111 10-- 11-- z After 010 access: blocktagdata 00-- 0101111 1000000 11--

56 © 2000 Morgan Kaufman Overheads for Computers as Components Direct-mapped cache behavior, cont’d. zAfter 011 access: blocktagdata 00-- 0101111 1000000 1100110 z After 100 access: blocktagdata 0011000 0101111 1000000 1100110

57 © 2000 Morgan Kaufman Overheads for Computers as Components Direct-mapped cache behavior, cont’d. zAfter 101 access: blocktagdata 0011000 0110001 1000000 1100110 z After 111 access: blocktagdata 0011000 0110001 1000000 1110100

58 © 2000 Morgan Kaufman Overheads for Computers as Components 2-way set-associtive cache behavior zFinal state of cache (twice as big as direct-mapped): setblk 0 tagblk 0 datablk 1 tagblk 1 data 0011000-- 010111110001 1000000-- 110011010100

59 © 2000 Morgan Kaufman Overheads for Computers as Components 2-way set-associative cache behavior zFinal state of cache (same size as direct- mapped): setblk 0 tagblk 0 datablk 1 tagblk 1 data 0010000101000 1100001110100

60 © 2000 Morgan Kaufman Overheads for Computers as Components Example caches zStrongARM: y16 Kbyte, 32-way, 32-byte block instruction cache. y16 Kbyte, 32-way, 32-byte block data cache (write-back). zSHARC: y32-instruction, 2-way instruction cache.

61 © 2000 Morgan Kaufman Overheads for Computers as Components Memory management units zMemory management unit (MMU) translates addresses: CPU main memory management unit logical address physical address

62 © 2000 Morgan Kaufman Overheads for Computers as Components Memory management tasks zAllows programs to move in physical memory during execution. zAllows virtual memory: ymemory images kept in secondary storage; yimages returned to main memory on demand during execution. zPage fault: request for location not resident in memory.

63 © 2000 Morgan Kaufman Overheads for Computers as Components Address translation zRequires some sort of register/table to allow arbitrary mappings of logical to physical addresses. zTwo basic schemes: ysegmented; ypaged. zSegmentation and paging can be combined (x86).

64 © 2000 Morgan Kaufman Overheads for Computers as Components Segments and pages memory segment 1 segment 2 page 1 page 2

65 © 2000 Morgan Kaufman Overheads for Computers as Components Segment address translation segment base addresslogical address range check physical address + range error segment lower bound segment upper bound

66 © 2000 Morgan Kaufman Overheads for Computers as Components Page address translation pageoffset pageoffset page i base concatenate

67 © 2000 Morgan Kaufman Overheads for Computers as Components Page table organizations flattree page descriptor page descriptor

68 © 2000 Morgan Kaufman Overheads for Computers as Components Caching address translations zLarge translation tables require main memory access. zTLB: cache for address translation. yTypically small.

69 © 2000 Morgan Kaufman Overheads for Computers as Components ARM memory management zMemory region types: ysection: 1 Mbyte block; ylarge page: 64 kbytes; ysmall page: 4 kbytes. zAn address is marked as section-mapped or page-mapped. zTwo-level translation scheme.

70 © 2000 Morgan Kaufman Overheads for Computers as Components ARM address translation offset1st index2nd index physical address Translation table base register 1st level table descriptor 2nd level table descriptor concatenate

71 © 2000 Morgan Kaufman Overheads for Computers as Components CPUs zCPU performance zCPU power consumption.

72 © 2000 Morgan Kaufman Overheads for Computers as Components Elements of CPU performance zCycle time. zCPU pipeline. zMemory system.

73 © 2000 Morgan Kaufman Overheads for Computers as Components Pipelining zSeveral instructions are executed simultaneously at different stages of completion. zVarious conditions can cause pipeline bubbles that reduce utilization: ybranches; ymemory system delays; yetc.

74 © 2000 Morgan Kaufman Overheads for Computers as Components Pipeline structures zBoth ARM and SHARC have 3-stage pipes: yfetch instruction from memory; ydecode opcode and operands; yexecute.

75 © 2000 Morgan Kaufman Overheads for Computers as Components ARM pipeline execution add r0,r1,#5 sub r2,r3,r6 cmp r2,#3 fetch time decode fetch execute decode fetch execute decode execute 123

76 © 2000 Morgan Kaufman Overheads for Computers as Components Performance measures zLatency: time it takes for an instruction to get through the pipeline. zThroughput: number of instructions executed per time period. zPipelining increases throughput without reducing latency.

77 © 2000 Morgan Kaufman Overheads for Computers as Components Pipeline stalls zIf every step cannot be completed in the same amount of time, pipeline stalls. zBubbles introduced by stall increase latency, reduce throughput.

78 © 2000 Morgan Kaufman Overheads for Computers as Components ARM multi-cycle LDMIA instruction fetchdecode ex ld r2 ldmia r0,{r2,r3} sub r2,r3,r6 cmp r2,#3 ex ld r3 fetch time decode ex sub fetchdecode ex cmp

79 © 2000 Morgan Kaufman Overheads for Computers as Components Control stalls zBranches often introduce stalls (branch penalty). yStall time may depend on whether branch is taken. zMay have to squash instructions that already started executing. zDon’t know what to fetch until condition is evaluated.

80 © 2000 Morgan Kaufman Overheads for Computers as Components ARM pipelined branch time fetchdecode ex bne bne foo sub r2,r3,r6 fetchdecode foo add r0,r1,r2 ex bne fetchdecode ex add ex bne

81 © 2000 Morgan Kaufman Overheads for Computers as Components Delayed branch zTo increase pipeline efficiency, delayed branch mechanism requires n instructions after branch always executed whether branch is executed or not. zSHARC supports delayed and non-delayed branches. ySpecified by bit in branch instruction. y2 instruction branch delay slot.

82 © 2000 Morgan Kaufman Overheads for Computers as Components Example: SHARC code scheduling L1=5; DM(I0,M1)=R1; L8=8; DM(I8,M9)=R2; z CPU cannot use DAG on cycle just after loading DAG’s register. yCPU performs NOP between register assign and DM.

83 © 2000 Morgan Kaufman Overheads for Computers as Components Rescheduled SHARC code L1=5; L8=8; DM(I0,M1)=R1; DM(I8,M9)=R2; z Avoids two NOP cycles.

84 © 2000 Morgan Kaufman Overheads for Computers as Components Example: ARM execution time zDetermine execution time of FIR filter: for (i=0; i<N; i++) f = f + c[i]*x[i]; zOnly branch in loop test may take more than one cycle.  BLT loop takes 1 cycle best case, 3 worst case.

85 © 2000 Morgan Kaufman Overheads for Computers as Components Superscalar execution zSuperscalar processor can execute several instructions per cycle. yUses multiple pipelined data paths. zPrograms execute faster, but it is harder to determine how much faster.

86 © 2000 Morgan Kaufman Overheads for Computers as Components Data dependencies zExecution time depends on operands, not just opcode. zSuperscalar CPU checks data dependencies dynamically: add r2,r0,r1 add r3,r2,r5 data dependency r0r1 r2r5 r3

87 © 2000 Morgan Kaufman Overheads for Computers as Components Memory system performance zCaches introduce indeterminacy in execution time. yDepends on order of execution. zCache miss penalty: added time due to a cache miss. zSeveral reasons for a miss: compulsory, conflict, capacity.

88 © 2000 Morgan Kaufman Overheads for Computers as Components CPU power consumption zMost modern CPUs are designed with power consumption in mind to some degree. zPower vs. energy: yheat depends on power consumption; ybattery life depends on energy consumption.

89 © 2000 Morgan Kaufman Overheads for Computers as Components CMOS power consumption zVoltage drops: power consumption proportional to V 2. zToggling: more activity means more power. zLeakage: basic circuit characteristics; can be eliminated by disconnecting power.

90 © 2000 Morgan Kaufman Overheads for Computers as Components CPU power-saving strategies zReduce power supply voltage. zRun at lower clock frequency. zDisable function units with control signals when not in use. zDisconnect parts from power supply when not in use.

91 © 2000 Morgan Kaufman Overheads for Computers as Components Power management styles zStatic power management: does not depend on CPU activity. yExample: user-activated power-down mode. zDynamic power management: based on CPU activity. yExample: disabling off function units.

92 © 2000 Morgan Kaufman Overheads for Computers as Components Application: PowerPC 603 energy features zProvides doze, nap, sleep modes. zDynamic power management features: yUses static logic. yCan shut down unused execution units. yCache organized into subarrays to minimize amount of active circuitry.

93 © 2000 Morgan Kaufman Overheads for Computers as Components PowerPC 603 activity zPercentage of time units are idle for SPEC integer/floating-point: unitSpecint92Specfp92 D cache29%28% I cache29%17% load/store35%17% fixed-point38%76% floating-point99%30% system register89%97%

94 © 2000 Morgan Kaufman Overheads for Computers as Components Power-down costs zGoing into a power-down mode costs: ytime; yenergy. zMust determine if going into mode is worthwhile. zCan model CPU power states with power state machine.

95 © 2000 Morgan Kaufman Overheads for Computers as Components Application: StrongARM SA-1100 power saving zProcessor takes two supplies: yVDD is main 3.3V supply. yVDDX is 1.5V. zThree power modes: yRun: normal operation. yIdle: stops CPU clock, with logic still powered.  Sleep: shuts off most of chip activity; 3 steps, each about 30  s; wakeup takes > 10 ms.

96 © 2000 Morgan Kaufman Overheads for Computers as Components SA-1100 power state machine run idle sleep P run = 400 mW P idle = 50 mW P sleep = 0.16 mW 10  s 90  s 160 ms 90  s


Download ppt "© 2000 Morgan Kaufman Overheads for Computers as Components CPUs zInput and output. zSupervisor mode, exceptions, traps. zCo-processors."

Similar presentations


Ads by Google