Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2
Computer Science and Engineering Copyright by Hesham El-Rewini Contents (Memory) Memory Hierarchy Cache Memory Placement Policies n Direct Mapping n Fully Associative n Set Associative Replacement Policies n FIFO, Random, Optimal, LRU, MRU Cache Write Policies
Computer Science and Engineering Copyright by Hesham El-Rewini Memory Hierarchy CPU Registers Cache Main Memory Secondary Storage Latency Bandwidth Speed Cost per bit
Computer Science and Engineering Copyright by Hesham El-Rewini Sequence of events 1.Processor makes a request for X 2.X is sought in the cache 3.If it exists hit (hit ratio h) 4.Otherwise miss (miss ratio m = 1-h) 5.If miss X is sought in main memory 6.It can be generalized to more levels
Computer Science and Engineering Copyright by Hesham El-Rewini Cache Memory The idea is to keep the information expected to be used more frequently in the cache. Locality of Reference n Temporal Locality n Spatial Locality Placement Policies Replacement Policies
Computer Science and Engineering Copyright by Hesham El-Rewini Placement Policies How to Map memory blocks (lines) to Cache block frames (line frames) Blocks (lines) Block Frames (Line Frames) Memory Cache
Computer Science and Engineering Copyright by Hesham El-Rewini Placement Policies n Direct Mapping n Fully Associative n Set Associative
Computer Science and Engineering Copyright by Hesham El-Rewini Direct Mapping Simplest A memory block is mapped to a fixed cache block frame (many to one mapping) J = I mod N n J Cache block frame number n I Memory block number n N number of cache block frames
Computer Science and Engineering Copyright by Hesham El-Rewini Address Format Memory M blocks Block size B words Cache N blocks Address size log 2 (M * B) TagBlock frameWord log 2 Blog 2 NRemaining bits log 2 M/N
Computer Science and Engineering Copyright by Hesham El-Rewini Example Memory 4K blocks Block size 16 words Address size log 2 (4K * 16) = 16 Cache 128 blocks TagBlock frameWord 475
Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) MemoryTagcache bits
Computer Science and Engineering Copyright by Hesham El-Rewini Fully Associative Most flexible A memory block is mapped to any available cache block frame (many to many mapping) Associative Search
Computer Science and Engineering Copyright by Hesham El-Rewini Address Format Memory M blocks Block size B words Cache N blocks Address size log 2 (M * B) TagWord log 2 BRemaining bits log 2 M
Computer Science and Engineering Copyright by Hesham El-Rewini Example Memory 4K blocks Block size 16 words Address size log 2 (4K * 16) = 16 Cache 128 blocks TagWord 412
Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) Memory Tagcache 12 bits
Computer Science and Engineering Copyright by Hesham El-Rewini Set Associative Compromise between the other two Cache number of sets Set number of blocks A memory block is mapped to any available cache block frame within a specific set Associative Search within a set
Computer Science and Engineering Copyright by Hesham El-Rewini Address Format Memory M blocks Block size B words Cache N blocks Number of sets S N/num of blocks per set Address size log 2 (M * B) log 2 B TagSetWord log 2 S Remaining bits log 2 M/S
Computer Science and Engineering Copyright by Hesham El-Rewini Example Memory 4K blocks Block size 16 words Address size log 2 (4K * 16) = 16 Cache 128 blocks Num of blocks per set = 4 Number of sets = 32 4 TagSetWord 57
Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) Set 0 Tag cache 7 bits Set Memory
Computer Science and Engineering Copyright by Hesham El-Rewini Comparison Simplicity Associative Search Cache Utilization Replacement
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise The instruction set for your architecture has 40-bit addresses, with each addressable item being a byte. You elect to design a four-way set-associative cache with each of the four blocks in a set containing 64 bytes. Assume that you have 256 sets in the cache. Show the Format of the address
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.) Address size = 40 Block size 64 words Num of blocks per set = 4 Number of sets = 256 Cache 256*4 blocks 6 TagSetWord 826
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) Consider the following sequence of addresses. (All are hex numbers) 0E1B01AA05 0E1B01AA07 0E1B2FE305 0E1B4FFD8F 0E1B01AA0E In your cache, what will be the tags in the sets(s) that contain these references at the end of the sequence? Assume that the cache is initially flushed (empty).
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.) 0E1B01AA05 0E1B E1B01AA07 0E1B E1B2FE305 0E1B2F
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.) 0E1B4FFD8F 0E1B4F E1B01AA0E 0E1B
Computer Science and Engineering Copyright by Hesham El-Rewini Replacement Techniques FIFO LRU MRU Random Optimal
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise Suppose that your cache can hold only three blocks and the block requests are as follows: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 Show the contents of the cache if the replacement policy is a) LRU, b) FIFO, c) Optimal
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) FIFO MRU
Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) OPT LRU 7 0 1
Computer Science and Engineering Copyright by Hesham El-Rewini Cache Write Policies Cache Hit n Write Through n Write Back Cache Miss n Write-allocate n Write-no-allocate
Computer Science and Engineering Copyright by Hesham El-Rewini Read Policy -- Cache Miss Missed block is brought to cache – required word forwarded immediately to the CPU Missed block is entirely stored in the cache and the required word is then forwarded to the CPU
Computer Science and Engineering Copyright by Hesham El-Rewini Pentium IV two-level cache Cache Level 1 L1 Cache Level 2 L2 Main Memory Processor
Computer Science and Engineering Copyright by Hesham El-Rewini Cache L1 Cache organizationSet-Associative Block Size64 bytes Cache L1 size 8KB Number of blocks per setFour CPU AddressingByte addressable
Computer Science and Engineering Copyright by Hesham El-Rewini CPU and Memory Interface MAR MDR CPU n - 1 b n lines b lines R / W Main Memory
Computer Science and Engineering Copyright by Hesham El-Rewini Pipelining
Computer Science and Engineering Copyright by Hesham El-Rewini Contents Introduction Linear Pipelines Nonlinear pipelines
Computer Science and Engineering Copyright by Hesham El-Rewini Basic Idea Assembly Line Divide the execution of a task among a number of stages A task is divided into subtasks to be executed in sequence Performance improvement compared to sequential execution
Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Task 1 2 n Sub-tasks 1 2 n Pipeline Stream of Tasks
Computer Science and Engineering Copyright by Hesham El-Rewini 5 Tasks on 4 stage pipeline Task 1 Task 2 Task 3 Task 4 Task Time
Computer Science and Engineering Copyright by Hesham El-Rewini Speedup t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = n * m/n + m -1
Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline Processing Stages are linearly connected Perform fixed function Synchronous Pipeline Clocked latches between Stage i and Stage i+1 Equal delays in all stages Asynchronous Pipeline (Handshaking)
Computer Science and Engineering Copyright by Hesham El-Rewini Latches S1 S2 S3 L1 L2 Equal delays clock period Slowest stage determines delay
Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table X X X X S1 S2 S3 S4 Time
Computer Science and Engineering Copyright by Hesham El-Rewini 5 tasks on 4 stages XXXXX XXXXX XXXXX XXXXX S1 S2 S3 S4 Time
Computer Science and Engineering Copyright by Hesham El-Rewini Non Linear Pipelines Variable functions Feed-Forward Feedback
Computer Science and Engineering Copyright by Hesham El-Rewini 3 stages & 2 functions S1 S2 S3 Y X
Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Tables for X & Y XXX XX XXX YY Y YYY S1 S2 S3 S1 S2 S3
Computer Science and Engineering Copyright by Hesham El-Rewini Linear Instruction Pipelines Assume the following instruction execution phases: n Fetch (F) n Decode (D) n Operand Fetch (O) n Execute (E) n Write results (W)
Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Instruction Execution I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 F D E W O
Computer Science and Engineering Copyright by Hesham El-Rewini Dependencies nData Dependency (Operand is not ready yet) nInstruction Dependency (Branching) Will that Cause a Problem?
Computer Science and Engineering Copyright by Hesham El-Rewini Data Dependency I 1 -- Add R1, R2, R3 I 2 -- Sub R4, R1, R5 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 F D E W O
Computer Science and Engineering Copyright by Hesham El-Rewini Solutions STALL Forwarding Write and Read in one cycle ….
Computer Science and Engineering Copyright by Hesham El-Rewini Instruction Dependency I 1 – Branch o I 2 – I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 F D E W O
Computer Science and Engineering Copyright by Hesham El-Rewini Solutions STALL Predict Branch taken Predict Branch not taken ….
Computer Science and Engineering Copyright by Hesham El-Rewini Floating Point Multiplication Inputs (Mantissa 1, Exponenet 1 ), (Mantissa 2, Exponent 2 ) Add the two exponents Exponent-out Multiple the 2 mantissas Normalize mantissa and adjust exponent Round the product mantissa to a single length mantissa. You may adjust the exponent
Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline for floating-point multiplication Add Exponents Multiply Mantissa Normalize Round Partial Products Accumulator Add Exponents Normalize Round Re normalize
Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline for floating-point Addition Partial Shift Add Mantissa Subtract Exponents Find Leading 1 Round Re normalize Partial Shift
Computer Science and Engineering Copyright by Hesham El-Rewini Combined Adder and Multiplier Partial Shift Add Mantissa Exponents Subtract / ADD Find Leading 1 Round Re normalize Partial Shift Partial Products C A B ED F G H