COMP381 by M. Hamdi 1 Final Exam Review. COMP381 by M. Hamdi 2 Exam Format It will cover material after the mid-term (Cache to multiprocessors) It is.

COMP381 by M. Hamdi 1 Final Exam Review

COMP381 by M. Hamdi 2 Exam Format It will cover material after the mid-term (Cache to multiprocessors) It is similar to the style of mid-term exam We will have 6-7 questions in the exam –One question: true/false or short questions which covers general topics. –5-6 other questions require calculation

COMP381 by M. Hamdi 3 Memory Systems

COMP381 by M. Hamdi 4 Memory Hierarchy - the Big Picture Problem: memory is too slow and/or too small Solution: memory hierarchy Fastest Slowest Smallest Biggest Highest Lowest Speed: Size: Cost: Control Datapath Secondary Storage (Disk) Processor Registers L2 Off-Chip Cache Main Memory (DRAM) L1 On-Chip Cache Larger Capacity Faster

COMP381 by M. Hamdi 5 Why Hierarchy Works The principle of locality –Programs access a relatively small portion of the address space at any instant of time. –Temporal locality: recently accessed instruction/data is likely to be used again –Spatial locality: instruction/data near recently accessed /instruction data is likely to be used soon Result: the illusion of large, fast memory Address Space 02 n - 1 Probability of reference

COMP381 by M. Hamdi 6 Cache Design & Operation Issues Q1: Where can a block be placed cache? (Block placement strategy & Cache organization) –Fully Associative, Set Associative, Direct Mapped. Q2: How is a block found if it is in cache? (Block identification) –Tag/Block. Q3: Which block should be replaced on a miss? (Block replacement) –Random, LRU. Q4: What happens on a write? (Cache write policy) –Write through, write back.

COMP381 by M. Hamdi 7 Q1: Block Placement Where can block be placed in cache? –In one predetermined place - direct-mapped Use fragment of address to calculate block location in cache Compare cache block with tag to test if block present –Anywhere in cache - fully associative Compare tag to every block in cache –In a limited set of places - set-associative Use address fragment to calculate set Place in any block in the set Compare tag to every block in set Hybrid of direct mapped and fully associative

COMP381 by M. Hamdi 8 Q2: Block Identification Every cache block has an address tag and index that identifies its location in memory Hit when tag and index of desired word match (comparison by hardware) Q: What happens when a cache block is empty? A: Mark this condition with a valid bit 0x 00001C0 0xff083c2d 1 Tag/indexValidData

COMP381 by M. Hamdi 9 Cache Replacement Policy Random –Replace a randomly chosen line LRU (Least Recently Used) –Replace the least recently used line

COMP381 by M. Hamdi 10 0x1234 Write-through Policy 0x1234 Processor Cache Memory 0x1234 0x5678

COMP381 by M. Hamdi 11 0x1234 Write-back Policy 0x1234 Processor Cache Memory 0x1234 0x5678 0x9ABC

COMP381 by M. Hamdi 12 Cache Performance Average Memory Access Time (AMAT), Memory Stall cycles The Average Memory Access Time (AMAT): The number of cycles required to complete an average memory access request by the CPU. Memory stall cycles per memory access: The number of stall cycles added to CPU execution cycles for one memory access. For an ideal memory: AMAT = 1 cycle, this results in zero memory stall cycles. Memory stall cycles per average memory access = (AMAT -1) Memory stall cycles per average instruction = Memory stall cycles per average memory access x Number of memory accesses per instruction = (AMAT -1 ) x ( 1 + fraction of loads/stores) Instruction Fetch

COMP381 by M. Hamdi 13 Cache Performance Unified cache: For a CPU with a single level (L1) of cache for both instructions and data and no stalls for cache hits: CPUtime = IC x (CPI execution + Mem Stall cycles per instruction) x Clock cycle time CPU time = IC x [CPI execution + Memory accesses/instruction x Miss rate x Miss penalty ] x Clock cycle time Split Cache: For a CPU with separate or split level one (L1) caches for instructions and data and no stalls for cache hits: CPUtime = IC x (CPI execution + Mem Stall cycles per instruction) x Clock cycle time Mem Stall cycles per instruction = Instruction Fetch Miss rate x Miss Penalty + Data Memory Accesses Per Instruction x Data Miss Rate x Miss Penalty

COMP381 by M. Hamdi 14 Memory Access Tree For Unified Level 1 Cache CPU Memory Access L1 Miss: % = (1- Hit rate) = (1-H1) Access time = M + 1 Stall cycles per access = M x (1-H1) L1 Hit: % = Hit Rate = H1 Access Time = 1 Stalls= H1 x 0 = 0 ( No Stall) L1L1 AMAT = H1 x 1 + (1 -H1 ) x (M+ 1) = 1 + M x ( 1 -H1) Stall Cycles Per Access = AMAT - 1 = M x (1 -H1) M = Miss Penalty H1 = Level 1 Hit Rate 1- H1 = Level 1 Miss Rate

COMP381 by M. Hamdi 15 Memory Access Tree For Separate Level 1 Caches CPU Memory Access L1L1 Instruction Data Data L1 Miss: Access Time : M + 1 Stalls per access: % data x (1 - Data H1 ) x M Data L1 Hit: Access Time: 1 Stalls = 0 Instruction L1 Hit: Access Time = 1 Stalls = 0 Instruction L1 Miss: Access Time = M + 1 Stalls Per access: %instructions x (1 - Instruction H1 ) x M Stall Cycles Per Access = % Instructions x ( 1 - Instruction H1 ) x M + % data x (1 - Data H1 ) x M AMAT = 1 + Stall Cycles per access

COMP381 by M. Hamdi 16 Cache Performance (various factors) Cache impact on performance –With and without cache –Processor clock rate Which one performs better: unified or split –Assuming same size What is the effect of cache organization on cache performance: 1-way, 8-way set associative –Tradeoffs between hit-time and hit-rate

COMP381 by M. Hamdi 17 Cache Performance (various factors) What is the affect of write policy on cache performance: Write back or write through – write allocate vs. no-write allocate –Stall Cycles Per Memory Access = % reads x (1 - H1 ) x M + % write x M –Stall Cycles Per Memory Access = (1-H1) x ( M x % clean + 2M x % dirty ) What is the effect of cache levels on performance: –Stall cycles per memory access = (1-H1) x H2 x T2 + (1-H1)(1-H2) x M –Stall cycles per memory access = (1-H1) x H2 x T2 + (1-H1) x (1-H2) x H3 x T3 + (1-H1)(1-H2) (1-H3)x M

COMP381 by M. Hamdi 18 Performance Equation To reduce CPUtime, we need to reduce Cache Miss Rate

COMP381 by M. Hamdi 19 Reducing Misses (3 Cs) Classifying Cache Misses: 3 Cs –C ompulsory — (Misses even in infinite size cache) –C apacity —(Misses due to size of cache) –C onflict —(Misses due to associative and size of cache) How to reduce the 3 Cs (Miss rate) –Increase Block Size –Increase Associativity –Use a Victim Cache –Use a Pseudo Associative Cache –Use a prefetching technique

COMP381 by M. Hamdi 20 Performance Equation To reduce CPUtime, we need to reduce Cache Miss Penalty

COMP381 by M. Hamdi 21 Memory Interleaving – Reduce miss penalty Interleaving Default Begin accessing one word, and while waiting, start accessing other three words (pipelining) CPU Cache Memory 4 bytes Bus CPU Cache Memory 2 4 bytes Memory 1 Memory 3 Memory 0 Bus Requires 4 separate memories, each 1/4 size Must finish accessing one word before starting the next access (1+25+1)x4 = 108 cycles 1251 30 cycles 12511 11 1 Spread out addresses among the memories Interleaving works perfectly with caches

COMP381 by M. Hamdi 22 Memory Interleaving: An Example Given the following system parameters with single cache level L 1 : Block size=1 word Memory bus width=1 word Miss rate =3% Miss penalty=27 cycles (1 cycles to send address 25 cycles access time/word, 1 cycles to send a word) Memory access/instruction = 1.2 Ideal CPI (ignoring cache misses) = 2 Miss rate (block size=2 word)=2% Miss rate (block size=4 words) =1% The CPI of the base machine with 1-word blocks = 2+(1.2 x 0.03 x 27) = 2.97 Increasing the block size to two words gives the following CPI: –32-bit bus and memory, no interleaving = 2 + (1.2 x.02 x 2 x 27) = 3.29 –32-bit bus and memory, interleaved = 2 + (1.2 x.02 x (28)) = 2.67 Increasing the block size to four words; resulting CPI: –32-bit bus and memory, no interleaving = 2 + (1.2 x 0.01 x 4 x 27) = 3.29 –32-bit bus and memory, interleaved = 2 + (1.2 x 0.01 x (30)) = 2.36

COMP381 by M. Hamdi 23 Cache vs. Virtual Memory Motivation for virtual memory (Physical memory size, multiprogramming) Concept behind VM is almost identical to concept behind cache. But different terminology! –Cache: Block VM: Page –Cache: Cache MissVM: Page Fault Caches implemented completely in hardware. VM implemented in software, with hardware support from CPU. Cache speeds up main memory access, while main memory speeds up VM access Translation Look-Aside Buffer (TLB) How to calculate the size of page tables for a given memory system How to calculate the size of pages given the size of page table

COMP381 by M. Hamdi 24 Virtual Memory Map Physical Memory Disk Individual Pages Virtual Memory: Definitions Key idea: simulate a larger physical memory than is actually available General approach: –Break address space up into pages –Each program accesses a working set of pages –Store pages: In physical memory as space permits On disk when no space left in physical memory –Access pages using virtual address

COMP381 by M. Hamdi 25 I/O Systems

COMP381 by M. Hamdi 26 I/O Systems

COMP381 by M. Hamdi 27 I/O concepts Disk Performance –Disk latency = average seek time + average rotational delay + transfer time + controller overhead Interrupt-driven I/O Memory-mapped I/O I/O channels: –DMA (Direct Memory Access) –I/O Communication protocols Daisy chaining Polling I/O Buses –Synchronous vs. asynchronous

COMP381 by M. Hamdi 28 RAID Systems Examined various RAID architectures: RAID0-RAID5: Cost, Performance (BW, I/O request rate) –RAID-0: No redundancy –RAID-1: Mirroring –RAID-2: Memory-style ECC –RAID-3: bit-interleaved parity –RAID-4: block-interleaved parity –RAID-5: block-interleaved distributed parity

COMP381 by M. Hamdi 29 Storage Architectures Examined various Storage architectures (Pros. And Cons): –DAS - Directly-Attached Storage –NAS - Network Attached Storage –SAN - Storage Area Network

COMP381 by M. Hamdi 30 Multiprocessors

COMP381 by M. Hamdi 31 Motivation Application needs Amdhal’s law –T(n) = –As n  , T(n)  Gustafson ’s law –T'(n) = s + n*p; T'(  )   !!!! 1 s+p/n 1s1s

COMP381 by M. Hamdi 32 SISD (Single Instruction, Single Data): –Typical uniprocessor systems that we’ve studied throughout this course. SIMD (Single Instruction, Multiple Data): –Multiple processors simultaneously executing the same instruction on different data. –Specialized applications (e.g., image processing). MIMD (Multiple Instruction, Multiple Data): –Multiple processors autonomously executing different instructions on different data. Flynn’s Taxonomy of Computing

COMP381 by M. Hamdi 33 Shared Memory Multiprocessors P/C Cache NIC MB P/C Cache NIC MB Bus/Custom-Designed Network Shared Memory

COMP381 by M. Hamdi 34 MPP (Massively Parallel Processing) Distributed Memory Multiprocessors P/C LM NIC MB P/C LM NIC MB Custom-Designed Network MB : Memory BusNIC : Network Interface Circuitry

COMP381 by M. Hamdi 35 Cluster Commodity Network (Ethernet, ATM, Myrinet) MB P/C M NIC P/C M Bridge LD NIC IOB LD : Local DiskIOB : I/O Bus

COMP381 by M. Hamdi 36 Grid P/C SM NIC LD Hub/LAN Internet IOC P/C SM NIC LD Hub/LAN IOC

COMP381 by M. Hamdi 37 Multiprocessor concepts SIMD Applications (Image processing) MIMD –Shared memory Cache coherence problems Bus scalability problems Distributed memory –Interconnection networks –Cluster of workstations

COMP381 by M. Hamdi 38 Preparation Strategy Read this review to focus your preparation –1 general question –5-6 other questions Around 50% for memory systems Around 50% I/O and multiprocessors Go through the lecture notes Go through the “training problems” We will have more office hours for help Good luck

COMP381 by M. Hamdi 1 Final Exam Review. COMP381 by M. Hamdi 2 Exam Format It will cover material after the mid-term (Cache to multiprocessors) It is.

Similar presentations

Presentation on theme: "COMP381 by M. Hamdi 1 Final Exam Review. COMP381 by M. Hamdi 2 Exam Format It will cover material after the mid-term (Cache to multiprocessors) It is."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

COMP381 by M. Hamdi 1 Final Exam Review. COMP381 by M. Hamdi 2 Exam Format It will cover material after the mid-term (Cache to multiprocessors) It is.

Similar presentations

Presentation on theme: "COMP381 by M. Hamdi 1 Final Exam Review. COMP381 by M. Hamdi 2 Exam Format It will cover material after the mid-term (Cache to multiprocessors) It is."— Presentation transcript:

Similar presentations

About project

Feedback