Download presentation
Presentation is loading. Please wait.
1
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power
2
CS/ECE 3330 – Fall 2009 Overview of computer architecture Low-level software High-level hardware The interface between the two (ISA) What’s new and cool Multicore and heterogeneity Power, temperature, reliability Logistics Ben Kreuter brk7bx@virginia.edu Wed 7PMbrk7bx@virginia.edu Michelle McDaniel mm4ab@virginia.edu Thu 3PMmm4ab@virginia.edu Last Time 1
3
CS/ECE 3330 – Fall 2009 What does it mean to have better performance? Class Survey 2
4
CS/ECE 3330 – Fall 2009 Defining Performance Which airplane has the best performance? 3
5
CS/ECE 3330 – Fall 2009 Response Time and Throughput Response time (execution time) How long it takes to do a task Throughput (bandwidth) Total work done per unit time –e.g., tasks/transactions/… per hour How are response time and throughput affected By replacing the processor with a faster version? By adding more processors? We’ll focus on response time for now… 4
6
CS/ECE 3330 – Fall 2009 Relative Performance Define Performance = 1/Execution Time “X is n time faster than Y” Example: time taken to run a program 20s on A, 30s on B So A is times faster than B 5 Execution Time B / Execution Time A = 30s / 20s = 1.5 1.5
7
CS/ECE 3330 – Fall 2009 Measuring Execution Time Elapsed time Total response time, including all aspects –Processing, I/O, OS overhead, idle time Determines system performance CPU time Time spent processing a given job –Discounts I/O time, other jobs’ shares Comprises user CPU time and system CPU time Different programs are affected differently by CPU and system performance 6
8
CS/ECE 3330 – Fall 2009 CPU Clocking Operation of digital hardware governed by a constant-rate clock Clock (cycles) Data transfer and computation Update state Clock period Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10 –12 s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0×10 9 Hz 7
9
CS/ECE 3330 – Fall 2009 CPU Time Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count 8
10
CS/ECE 3330 – Fall 2009 CPU Time Example Computer A: 2GHz clock, 20s CPU time Designing Computer B Aim for 12s CPU time Can do faster clock, but causes 1.2 × clock cycles How fast must Computer B’s clock be? 9
11
CS/ECE 3330 – Fall 2009 Instruction Count and CPI Instruction Count for a program Determined by program, ISA and compiler Average cycles per instruction Determined by CPU hardware If different instructions have different CPI –Average CPI affected by instruction mix 10
12
CS/ECE 3330 – Fall 2009 CPI Example Computer A: Cycle Time = 300ps, CPI = 2.0 Computer B: Cycle Time = 600ps, CPI = 1.4 Same ISA Which is faster, and by how much? A is faster… …by this much 11
13
CS/ECE 3330 – Fall 2009 CPI in More Detail If different instruction classes take different numbers of cycles Weighted average CPI Relative frequency 12
14
CS/ECE 3330 – Fall 2009 CPI Example Alternative compiled code sequences using instructions in classes A, B, C ClassABC CPI for class123 IC in sequence 1424 IC in sequence 2822 Sequence 1: IC = 10 Clock Cycles = 4×1 + 2×2 + 4×3 = 20 Avg CPI= 20/10 = 2.0 Sequence 2: IC =12 Clock Cycles = 8×1 + 2×2 + 2×3 = 18 Avg CPI= 18/12 = 1.5 13
15
CS/ECE 3330 – Fall 2009 The Big Picture Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, T c 14
16
CS/ECE 3330 – Fall 2009 Pitfall: Amdahl’s Law Improving an aspect of a computer and expecting a proportional improvement in overall performance Can’t be done! Example: multiply accounts for 80s/100s How much improvement in multiply performance to get 5× overall? Corollary: make the common case fast 15
17
CS/ECE 3330 – Fall 2009 1.Real applications 2.Modified applications 3.Kernels (small, critical parts of real applications) 4.Toy benchmarks 5.Synthetic benchmarks Benchmarking 16 + - Accuracy
18
CS/ECE 3330 – Fall 2009 SPEC CPU Benchmarks Programs used to measure performance Supposedly typical of actual workload Standard Performance Evaluation Corp (SPEC) Develops benchmarks for CPU, I/O, Web, … SPEC CPU2006 Elapsed time to execute a selection of programs –Negligible I/O, so focuses on CPU performance Normalize relative to reference machine Summarize as geometric mean of performance ratios –CINT2006 (integer) and CFP2006 (floating-point) 17
19
CS/ECE 3330 – Fall 2009 CINT2006 for Opteron X4 2356 NameDescriptionIC×10 9 CPITc (ns)Exec timeRef timeSPECratio perlInterpreted string processing2,1180.750.406379,77715.3 bzip2Block-sorting compression2,3890.850.408179,65011.8 gccGNU C Compiler1,0501.720.47248,05011.1 mcfCombinatorial optimization33610.000.401,3459,1206.8 goGo game (AI)1,6581.090.4072110,49014.6 hmmerSearch gene sequence2,7830.800.408909,33010.5 sjengChess game (AI)2,1760.960.483712,10014.5 libquantumQuantum computer simulation1,6231.610.401,04720,72019.8 h264avcVideo compression3,1020.800.4099322,13022.3 omnetppDiscrete event simulation5872.940.406906,2509.1 astarGames/path finding1,0821.790.407737,0209.1 xalancbmkXML parsing1,0582.700.401,1436,9006.0 Geometric mean11.7 High cache miss rates 18
20
CS/ECE 3330 – Fall 2009 Performance can be measured a number of ways Know the ways, and the potential misconceptions Mostly boils down to the “standard performance equation” Next Time… Power has become a limiting factor First problem set is due (2PM) Summary 19
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.