Presentation is loading. Please wait.

Presentation is loading. Please wait.

EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.

Similar presentations


Presentation on theme: "EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction."— Presentation transcript:

1 EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction

2 EEL5708/Bölöni Lec 1.2 Acknowledgements All the lecture slides were adopted from the slides of David Patterson (1998, 2001) and David E. Culler (2001), Copyright 1998- 2002, University of California Berkeley

3 EEL5708/Bölöni Lec 1.3 Case 1: VIA KT266 chipset for the Athlon processors

4 EEL5708/Bölöni Lec 1.4 Take 1: April 4, 2001 Tom’s Hardware ( www.tomshardware.com). Web site for hardware entusiasts. Review of the VIA Apollo KT266 chipset. http://www17.tomshardware.com/mainboard/01q2/010409/kt2 66-10.html The website’s conclusion: KT266 is still way too slow to challenge or even replace AMD's 760 chipset. As a conclusion, I could maybe say the typical words always used in early reviews "let's hope VIA will finally improve KT266". However, I have my doubts if this will happen any time soon. My advice to you is to either forget about DDR altogether for the time being, or to go for Athlon plus AMD760 and NOTHING ELSE.

5 EEL5708/Bölöni Lec 1.5 Take 2: One week later… Article title: “VIA Apollo KT266 revisited: Much Ado About Nothing” ( http://www17.tomshardware.com/mainboard/01q2/0 10416/index.html ) Another website ( www.anandtech.com ) obtains different results. An additional resistor (!) mounted on the motherboard and a different BIOS. Tom’s Hardware concludes that there are indeed improvements, but they are not significant enough to change the conclusion.

6 EEL5708/Bölöni Lec 1.6 Take 3: Five months later (September 2001) VIA KT266A is launched Tom’s Hardware: “’A’ stands for vastly improved performance” (http://www17.tomshardware.com/mainboard/01q3/01 0902/index.html) Changes: “improvements” to the memory controller. Processor frequency, bus frequency, etc. stay the same. Pin-by-pin compatible with the predecessors! Conclusion: “The performance of Apollo KT266A is nothing short of impressive.”

7 EEL5708/Bölöni Lec 1.7 Synthetic benchmarks:

8 EEL5708/Bölöni Lec 1.8 Real world benchmarks

9 EEL5708/Bölöni Lec 1.9 Some conclusions “Architecture” matters. Real world benchmarks less improvement than synthetic ones: Amdahl’s Law Which benchmark do I care about? (this time at least, they were consistent…) …

10 EEL5708/Bölöni Lec 1.10 Case 2: Video compression performance in Intel Pentium 4 vs. AMD Athlon

11 EEL5708/Bölöni Lec 1.11 Take 1 (11/20/00): First impressions Intel Pentium 4 is launched. The initial measurements show that it greatly overperforms the AMD Athlon for MPEG 4 video compression. http://www6.tomshardware.com/cpu/00q4/0 01120/index.html

12 EEL5708/Bölöni Lec 1.12 Take 1 (11/20/00): First impressions (cont’d)

13 EEL5708/Bölöni Lec 1.13 Take 2: New results force new conclusions Concerns are raised about the fact that the measurement was done with a low quality setting (MMX arithmetics) Repeating the measurements with floating point arithmetics, the relative performance was reversed. http://www6.tomshardware.com/cpu/00q4/0 01122/index.html

14 EEL5708/Bölöni Lec 1.14 Take 2 : New results force new conclusions (cont’d)

15 EEL5708/Bölöni Lec 1.15 Take 3: Intel engineers create an optimized version of the software As a response, Intel engineers created a modified version of the software: -recompiled it with higher optimizations. -rewritten parts of the code to use the new instruction set extensions (SSE2) The higher optimizations benefited both Intel and AMD processors (but Intel more) The SSE2 options reversed the performance ranking again. OBS: AMD engineers created an AMD optimized version, too, with significant improvements, but this did not change the rankings.

16 EEL5708/Bölöni Lec 1.16 Take 3: Intel engineers create an optimized version of the software

17 EEL5708/Bölöni Lec 1.17 Take 3 (cont’d)

18 EEL5708/Bölöni Lec 1.18 Case 2: Conclusions Real world benchmark, huge differences –Why? Software solution to a hardware problem? –Optimizing for the architecture –So, what if it is not open source? –Software development cycles… Picking the right architecture + understanding the architecture we have

19 EEL5708/Bölöni Lec 1.19 Review: Measuring performance

20 EEL5708/Bölöni Lec 1.20 Performance measures Time to execute a given program Number of programs which can be run in parallel Responsiveness (user interfaces) Predictable execution time (for real time systems) Energy consumption (mostly for portables, but check the new Google and Microsoft data centers…) And so on…

21 EEL5708/Bölöni Lec 1.21 Which is faster? (Latency vs throughput) Time to run the task (ExTime) –Execution time, response time, latency Tasks per day, hour, week, sec, ns … (Performance) –Throughput, bandwidth Plane Boeing 747 BAD/Sud Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers 470 132 Throughput (pmph) 286,700 178,200

22 EEL5708/Bölöni Lec 1.22 Performance(X) Execution_time(Y) n == Performance(Y) Execution_time(X) Definitions Performance is in units of things per sec –bigger is better If we are primarily concerned with response time –performance(x) = 1 execution_time(x) " X is n times faster than Y" means

23 EEL5708/Bölöni Lec 1.23 Computer Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Program X Compiler X (X) Inst. Set. X X Organization X X Technology X inst count CPI Cycle time

24 EEL5708/Bölöni Lec 1.24 Cycles Per Instruction (Throughput) “Instruction Frequency” CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count “Average Cycles per Instruction”

25 EEL5708/Bölöni Lec 1.25 Example: Calculating CPI bottom up Typical Mix of instruction types in program Base Machine (Reg / Reg) OpFreqCyclesCPI(i)(% Time) ALU50%1.5(33%) Load20%2.4(27%) Store10%2.2(13%) Branch20%2.4(27%) 1.5


Download ppt "EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction."

Similar presentations


Ads by Google