Presentation on theme: "A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State."— Presentation transcript:
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State Univ.
Feb. 15, 2005HPCA-112 DRAM Memory Optimizations Optimizations at DRAM side can make a big difference on single-threaded processors Enhancement of chip interface/interconnect Access scheduling [Hong et al. HPCA’99, Mathew et al. HPCA’00, Rixner et al. ISCA’00] DRAM-side locality [Cuppu et al. ISCA’99, ISCA’01, Zhang et al., MICRO’00, Lin et al. HPCA’01]
Feb. 15, 2005HPCA-113 How does SMT Impact Memory Hierarchy? Less performance loss per cache miss to DRAM memories – Lower benefit from DRAM-side optimizations? But more cache misses due to cache contention – Much more pressure on main memory Is DRAM memory design more important or not?
Feb. 15, 2005HPCA-115 Memory Optimization Techniques Page modes Open page: good for programs with good locality Close page: good for programs with poor locality Mapping schemes Exploitation of concurrency (multiple channels, chips, banks) Row buffer conflicts Memory access scheduling Reorder of concurrent accesses Reducing average latency and improving bandwidth utilization
Feb. 15, 2005HPCA-116 Memory Access Scheduling for Single- Threaded Systems Hit-first A row buffer hit has a higher priority than a row buffer miss Read-first A read has a higher priority than a write Age-based An older request has a higher priority than a new one Criticality-based A critical request has a higher priority than a non- critical one
Feb. 15, 2005HPCA-118 Thread-Aware Memory Scheduling New dimension in memory scheduling for SMT systems: considering the current state of each thread States related to memory accesses Number of outstanding requests Number of processor resources occupied
Feb. 15, 2005HPCA-119 Outstanding Request-Based Scheme Request-based A request generated by a thread with fewer pending requests has a higher priority H A1 H A2 H B1 H A3 H A4 H B2 time H A1 H A2 H A3 H A4 H B1 H B2
Feb. 15, 2005HPCA-1110 Outstanding Request-Based Scheme Request-based Hit-first and read-first are applied on top For SMT processors, sustained memory bandwidth is more important than the latency of an individual access H A1 H A2 M B1 H A3 H A4 M B2 time H A1 H A2 H A3 H A4 M B1 M B2
Feb. 15, 2005HPCA-1111 Resource Occupancy-Based Scheme ROB-based Higher priority to requests from threads holding more ROB entries IQ-based Higher priority to requests from threads holding more IQ entries Hit-first and read-first are applied on top
Feb. 15, 2005HPCA-1112 Methodology Simulator SMT extension of sim-Alpha Event-driven memory simulator (DDR SDRAM and Direct Rambus DRAM) Workload Mixture of SPEC 2000 applications 2-, 4-, 8-thread workload “ILP”, “MIX”, and “MEM” workload mixes
Feb. 15, 2005HPCA-1122 Conclusion DRAM optimizations have significant impacts on the performance of SMT (and likely CMP) processors Mostly effective when a workload mix includes some memory-intensive programs Performance is sensitive to memory channel organizations DRAM-side locality is harder to explore due to contention Thread-aware access scheduling schemes does bring good performance