Presentation on theme: "Recap Measuring and reporting performance Quantitative principles Performance vs Cost/Performance."— Presentation transcript:
Recap Measuring and reporting performance Quantitative principles Performance vs Cost/Performance
Fallacies and Pitfalls Fallacy: Peak performance tracks observed performance –Gap is often huge –E.g. Hitachi supercomputer 2 times faster than Cray (peak), but Cray is 2 times faster (real life!) –DEC Alpha: reported peak performance (assuming perfect pipeline and superscalar execution!) –Often used in supercomputer industry –Still a bad idea!
Fallacies and Pitfalls Fallacy: Best design optimises the primary objective without considering implementation –Complex designs impact time to market, affecting competitiveness –E.g. Intel Itanium — two year delay!
Fallacies and Pitfalls Pitfall: Ignoring software costs –Hardware costs used to dominate, but software is now a significant cost factor (e.g. 50% for a midrange server) –Impacts on cost-performance
Fallacies and Pitfalls Pitfall: Falling prey to Amdahl’s Law –Easy to get side-tracked into optimising some area that will have little overall impact
Fallacies and Pitfalls Fallacy: Synthetic benchmarks predict real performance (since 1 st Edition!) –Benchmarks are very susceptible to compiler and hardware optimisations –Examples: Compilers can discard 25% of Dhrystone! Whetstone doesn’t allow for some common, real optimisations! Compilers do benchmark-specific optimisations!
Fallacies and Pitfalls Fallacy: MIPS is useful for performance comparison (also 1 st Edition!) –Still popular (embedded processors) –MIPS depends on instruction set (useless for comparing different architectures) –MIPS varies between programs –MIPS can vary inversely to performance! E.g. FP in hardware/software
Chapter 1: Concluding Comments Several concepts that will be explored in more detail Chapter 2: Instruction set architecture Chapters 3 & 4: Pipelining –Appendix A: Basics –Chapter 3: Hardware techniques –Chapter 4: Compiler techniques Chapter 5: Memory hierarchies
Historical Perspectives Early computer history History of performance measurement –Details of Whetstone, MIPS, SPEC, etc.
Chapter Two Instruction Set Principles and Examples EDSAC Instruction Set
Contents Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors Impact of compilers
Introduction Use real programs for measurement –Results depend on programs and compilers used, but should be representative –Designers would consider much larger sets of programs Measurements are usually dynamic
2.2. Classification of Instruction Set Architectures Major criterion: CPU operand storage Four main styles of architecture: –Stack –Accumulator –General-purpose register machines Register-Memory Register-Register (or load-store) Operands are implicit Acc. implicit/Other explicit Operands explicit
Popularity Early machines: –Stack and accumulator Since 1980’s: –General register, load-store machines
Advantages of Registers Fast! Easier for compilers to use and optimise Can hold variables –Reduces memory traffic –Increases performance –Decreases program size –Dedicated registers frustrate these goals
Classifying GPR Machines Number of operands –Two or three Number of operands that may be in memory –0…3
2.3. Memory Addressing How is data accessed? How is a memory address interpreted? –Big-/Little-Endian ordering Generally unnoticed, except when exchanging data Alignment –Some machines insist on alignment (e.g. SPARC) –Other machines require multiple memory accesses for unaligned data
Data Addressing Modes Ten common addressing modes –Register –Immediate –Displacement –Register indirect –Indexed –Direct (or absolute) –Memory indirect –Autoincrement/Autodecrement –Scaled SPARC
Usage of Addressing Modes Measurements based on VAX –TeX, Spice, gcc Immediate and Displacement addressing dominate
Displacement Addressing Mode Wide variation in displacement (offset) values (Alpha, using SPEC2000)
Immediate Addressing Mode Mainly used for comparisons and ALU ops Overall (Alpha): –21% of instructions (integer) –16% of instructions (fp) Range of values (Alpha): –Mainly small ( 12 bits)