1 Structure of Computer Systems Course 2 Computer performance and optimality.

Slides:



Advertisements
Similar presentations
Computer Abstractions and Technology
Advertisements

CSCI 1412 Tutorial 1 Introduction to Hardware, Software Parminder Kang Home:
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Lecture 2c: Benchmarks. Benchmarking Benchmark is a program that is run on a computer to measure its performance and compare it with other machines Best.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
Introduction CS 524 – High-Performance Computing.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 Recap. 2 Measuring Performance  A computer user: response time (execution time).  A computer center manager - throughput - the total amount of work.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 4 Assessing and Understanding Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Managing Information Technology 6 th Edition CHAPTER 2 COMPUTER HARDWARE.
SPEC 2006 CSE 820. Michigan State University Computer Science and Engineering Q1. What is SPEC? SPEC is the Standard Performance Evaluation Corporation.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Using Standard Industry Benchmarks Chapter 7 CSE807.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Performance & Benchmarking. What Matters? Which airplane has best performance:
1 Computer Performance: Metrics, Measurement, & Evaluation.
Computer Performance Computer Engineering Department.
Chapter 1 Computer Abstractions and Technology Part II.
BİL 221 Bilgisayar Yapısı Lab. – 1: Benchmarking.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1.1 Operating System Concepts Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.
- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.
Lecture 9: 9/24/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Classification of Digital Computers & Applications of Computers
Performance Performance
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Measuring Performance and Benchmarks Instructor: Dr. Mike Turi Department of Computer Science and Computer Engineering Pacific Lutheran University Lecture.
Computer Architecture & Operations I
Computer Organization and Architecture Lecture 1 : Introduction
Lecture 2: Performance Evaluation
Computer Architecture & Operations I
Introduction to Computer
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Morgan Kaufmann Publishers Computer Abstractions and Technology
6. Structure of Computers
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CMSC 611: Advanced Computer Architecture
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Subject Name: Operating System Concepts Subject Number:
Computer Evolution and Performance
CMSC 611: Advanced Computer Architecture
Benchmarks Programs specifically chosen to measure performance
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

1 Structure of Computer Systems Course 2 Computer performance and optimality

2 Performance requirements  small execution time  short reaction time to external events  high memory capacity and speed  many input/output facilities (interfaces)  rich development facilities  small dimensions and specific shapes  predictability, safety and fault tolerance  small costs: absolute and relative

3 Optimal computer architecture  A compromise between performance parameters  Depends on the purpose and type of the computer  Computer types (based on purpose): General purpose computers General purpose computers high performance computers (HPC)high performance computers (HPC) personal computerspersonal computers mobile computersmobile computers Computers for dedicated purposes Computers for dedicated purposes scientific computingscientific computing military computers (safety critical and highly reliable)military computers (safety critical and highly reliable) industrial control and automation (embedded systems)industrial control and automation (embedded systems) measurement and analysis (e.g. medical devices, intelligent sensors)measurement and analysis (e.g. medical devices, intelligent sensors) Classification based on performance: - Small, embedded systems -Control systems, smart sensors - Personal computers - desktop, laptop, tablet-PC -High performance computers -Parallel, GRID, cloud Old classification: mainframes – e.g. IBM 360/370, Felix 256 mainframes – e.g. IBM 360/370, Felix 256 minicomputers – PDP11, SUN station, Independent, Coral minicomputers – PDP11, SUN station, Independent, Coral microcomputers – microprocessor-based computers (e.g. PC, home computers) microcomputers – microprocessor-based computers (e.g. PC, home computers)

4 Optimal computer architecture  Classification based on architecture: single processor computer single processor computer multiprocessor computers: multiprocessor computers: parallel systemsparallel systems multi-core processors multi-core processors symmetric and asymmetric parallel systems symmetric and asymmetric parallel systems distributed systemsdistributed systems personal computers and network communication for a specific (common) purpose personal computers and network communication for a specific (common) purpose GRIDs GRIDs Clouds: Clouds: computer as a servicecomputer as a service storage as a servicestorage as a service platform as a serviceplatform as a service software as a servicesoftware as a service

5 Optimal computer architecture  Optimal performance parameters for different type of computers: HPC – high performance computers: HPC – high performance computers: highly parallel computers – – cores or processorshighly parallel computers – – cores or processors usage: scientific computing (physics, astronomy, bioinformatics, chemistry), simulation (fluid’s flow, weather), cryptographyusage: scientific computing (physics, astronomy, bioinformatics, chemistry), simulation (fluid’s flow, weather), cryptography speed: Tflopsspeed: Tflops memory capacity: TBytesmemory capacity: TBytes communication: InfiniBand (2-300 Gbs), Cray Geminicommunication: InfiniBand (2-300 Gbs), Cray Gemini power consumption: 10KW- 10MW (Mariselu power station ~200MW)power consumption: 10KW- 10MW (Mariselu power station ~200MW) price: hard to tellprice: hard to tell see top 500 supercomputers ( top 500 supercomputers ( no 1 Titan/USA, cores no 1 Titan/USA, cores no. 2 Sequoia/SUA, cores no. 2 Sequoia/SUA, cores no. 3 K computer/ Japan, cores no. 3 K computer/ Japan, cores

6 HPC – high performance computers  HPC at CERN architecture: GRID architecture: GRID organization: 3 tires organization: 3 tires at least processors in 32 countries at least processors in 32 countries serves 5000 scientists serves 5000 scientists in UTCN: 128 quad-core processors, 512 cores in UTCN: 128 quad-core processors, 512 cores  Blue Gene - IBM architecture: parallel architecture: parallel 65,536 dual-core processors 65,536 dual-core processors 360 teraflop peak speed 360 teraflop peak speed Where is that bit? 1+1=3 ?

7 HPC – high performance computers  CG-UTCN – Centrul GRID al UTCN  64 processor boards  128 quad-core processors,  512 cores  1024 virtual processors (hyper-threading)  storage: 12 Tbytes  price: RON

8 Optimal computer architecture  Optimal performance parameters for different type of computers PC - personal computers: PC - personal computers: single or multi-core systems – 1-8 cores (1-2 processors)single or multi-core systems – 1-8 cores (1-2 processors) usage: engineering, accounting, administration, entertainment, document processing, communicationusage: engineering, accounting, administration, entertainment, document processing, communication speed: Gflopsspeed: Gflops memory capacity: 1-16 GBytes (internal), 0,5-1TBytes (external)memory capacity: 1-16 GBytes (internal), 0,5-1TBytes (external) communication: Ethernet (0,1-1 Gbs)communication: Ethernet (0,1-1 Gbs) power consumption: Wpower consumption: W price: USDprice: USD dimensional types: desktop, laptop, tablet, hand-helddimensional types: desktop, laptop, tablet, hand-held

9 Optimal computer architecture  Optimal performance parameters for different type of computers Mobile devices: Mobile devices: single or multi-core systems – 1-4 cores (1 processors)single or multi-core systems – 1-4 cores (1 processors) usage: communication, entertainment, place-holder for PCusage: communication, entertainment, place-holder for PC speed: Mflopsspeed: Mflops memory capacity: GBytes (internal),memory capacity: GBytes (internal), communication: WiFi, Bluetoth ( Mbs)communication: WiFi, Bluetoth ( Mbs) power consumption: limited to the accumulator’s capacitypower consumption: limited to the accumulator’s capacity price: USDprice: USD dimensional limitationsdimensional limitations

10 Optimal computer architecture  Optimal performance parameters for different type of computers Dedicated and embedded systems Dedicated and embedded systems single processor systems – microcontroller, DSP (digital signal processor), MSP (mixed signal processor)single processor systems – microcontroller, DSP (digital signal processor), MSP (mixed signal processor) usage: automation, measurement, sensors, medical devicesusage: automation, measurement, sensors, medical devices speed: 1-20 MIPSspeed: 1-20 MIPS memory capacity: bytes (data), 0-32Kbytes (program), 1- 2Kbyte EEPROMmemory capacity: bytes (data), 0-32Kbytes (program), 1- 2Kbyte EEPROM communication: serial RS232, CAN, I2C ( bits/s)communication: serial RS232, CAN, I2C ( bits/s) power consumption: very low (battery powered), with low power modes (1μA-10mA)power consumption: very low (battery powered), with low power modes (1μA-10mA) price: USDprice: USD dimension: very small packages (8, 16, 28, 40 pins)dimension: very small packages (8, 16, 28, 40 pins)

11 Measuring the performance of a computer – benchmark programs  Definition 1 (wikipedia): a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it.  Definition 2: a method of comparing the performance of various computer systems  Measuring and assessing the performance of a system is not a trivial task: some computers/CPUs perform better for some tests and worse for others (e.g. good results for image processing but less good for database applications) some computers/CPUs perform better for some tests and worse for others (e.g. good results for image processing but less good for database applications) performance should be a weighted average of a number of specific tests performance should be a weighted average of a number of specific tests

12 Benchmark programs  Real programs word processing software word processing software user's application software user's application software  Micro-benchmarks Designed to measure the performance of a very small and specific piece of code. Designed to measure the performance of a very small and specific piece of code.  Kernel contains codes that perform a specific basic operation contains codes that perform a specific basic operation normally abstracted from actual program normally abstracted from actual program popular kernel: Livermore loops (every loop is a mathematical operation) popular kernel: Livermore loops (every loop is a mathematical operation) Linpack benchmark (contains basic linear algebra subroutines) Linpack benchmark (contains basic linear algebra subroutines) results are represented in MFLOPS results are represented in MFLOPS  Component Benchmarks/ micro- benchmarks programs designed to measure performance of a computer's basic components programs designed to measure performance of a computer's basic components automatic detection of computer's hardware parameters like number of registers, cache size, memory latency automatic detection of computer's hardware parameters like number of registers, cache size, memory latency  Synthetic Benchmarks Procedure for programming synthetic benchmark: Procedure for programming synthetic benchmark: take statistics of all types of operations from many application programstake statistics of all types of operations from many application programs get proportion of each operationget proportion of each operation write program based on the proportion abovewrite program based on the proportion above Types of Synthetic Benchmark are: Types of Synthetic Benchmark are: Dhrystone – integer arithmeticDhrystone – integer arithmetic Whetstone – integer and floating point arithmeticWhetstone – integer and floating point arithmetic

13 Benchmark programs  Other benchmarks I/O benchmarks I/O benchmarks Database benchmarks: to measure the throughput and response times of database management systems (DBMS') Database benchmarks: to measure the throughput and response times of database management systems (DBMS') Parallel benchmarks: used on machines with multiple cores, processors or systems consisting of multiple machines Parallel benchmarks: used on machines with multiple cores, processors or systems consisting of multiple machines  Issues regarding good benchmarking: some processor architectures were designed for best benchmarking results, but with less overall performance some processor architectures were designed for best benchmarking results, but with less overall performance many benchmarks concentrate on computations and less on other aspects such as: memory access time, input/output operation’s delays many benchmarks concentrate on computations and less on other aspects such as: memory access time, input/output operation’s delays benchmarks are not relevant for wide distributed systems benchmarks are not relevant for wide distributed systems there is no unique measure of “performance” in computing there is no unique measure of “performance” in computing

14 Computing the benchmark results  Arithmetical mean benchmark where: t i – execution time of program “i” from the set of n test programs  Weighted arithmetic mean where: w i – the weight of program “i” from the set indicating its frequency of execution w i chosen so that on a reference computer the execution time of each benchmark (program) is equal => NORMALIZATION w i chosen so that on a reference computer the execution time of each benchmark (program) is equal => NORMALIZATION

15 Computing the benchmark results  Geometrical mean  Normalized Geometrical mean

16 Computing the benchmark results  Effects of normalization: the result depends on the machine used as a reference: A, B and C the result depends on the machine used as a reference: A, B and C t on A (s) t on B (s) t on C (s) Normalized to A for A,B and C Normalized to B for A,B and C Normalized to C for A,B and C Program Program , Arithm. mean , ,055 0,055 1 Geom. mean , ,031 0,031 1

17 Conclusions of the previous table:  for arithmetic mean: if the reference is computer A: if the reference is computer A: A is as fast as AA is as fast as A B is ~5 times slower than AB is ~5 times slower than A C is 55 times slower than AC is 55 times slower than A if the reference is computer B: if the reference is computer B: A is ~5 times slower than BA is ~5 times slower than B B is as fast as BB is as fast as B C is 55 times slower than BC is 55 times slower than B if the reference is computer C if the reference is computer C A is 18 times faster than CA is 18 times faster than C B is 18 times faster than CB is 18 times faster than C C is as fast as CC is as fast as C  for geometric mean: if the reference is computer A: if the reference is computer A: A is as fast as AA is as fast as A B is as fast as AB is as fast as A C is ~32 times slower than AC is ~32 times slower than A if the reference is computer B: if the reference is computer B: A is as fast as BA is as fast as B B is as fast as BB is as fast as B C is ~32 times slower than AC is ~32 times slower than A if the reference is computer C if the reference is computer C A is ~32 times faster than CA is ~32 times faster than C B is ~32 times faster than CB is ~32 times faster than C C is as fast as CC is as fast as C

18 Computing the benchmark results Advantages of geometric mean: Advantages of geometric mean: It is independent of the running times of the individual programsIt is independent of the running times of the individual programs It does not matter which machine is used for normalizationIt does not matter which machine is used for normalization Disadvantage of geometric mean: Disadvantage of geometric mean: It does not predict execution timeIt does not predict execution time

19 Benchmark programs  Goal: to write a package of programs that best measure the performance of a computer system  Solutions: real programs – that solve different classical problems real programs – that solve different classical problems synthetic programs – no practical result, but preserve the frequency of instructions measured in real cases synthetic programs – no practical result, but preserve the frequency of instructions measured in real cases

20 Examples of benchmark programs  Whetstone synthetic program Published in 1976 by the National Physical Laboratory (NPL), Great Britain Published in 1976 by the National Physical Laboratory (NPL), Great Britain preserves the frequency of instructions in scientific and engineering applications written in Algol and later in Fortran and Pascal preserves the frequency of instructions in scientific and engineering applications written in Algol and later in Fortran and Pascal floating point instructions have an important role floating point instructions have an important role  Dhrystone synthetic program Published in 1984 Published in 1984 preserves the frequency of instructions in system programming (e.g. operating system components) using Ada and C programming language preserves the frequency of instructions in system programming (e.g. operating system components) using Ada and C programming language frequency measurements are published frequency measurements are published no emphasis on FP operations no emphasis on FP operations  Issues with synthetic benchmarks: does not reflect well the needs of a real application does not reflect well the needs of a real application some computer architectures were optimized for best performance regarding synthetic benchmarks, but with less performance on real applications some computer architectures were optimized for best performance regarding synthetic benchmarks, but with less performance on real applications

21 Examples of benchmark programs  Kernel benchmark programs based on time-critical components of real applications based on time-critical components of real applications focused on measuring the performance of supercomputers running scientific applications focused on measuring the performance of supercomputers running scientific applications examples: examples: Livermore Loops:Livermore Loops: benchmark for parallel computers benchmark for parallel computers 24 “do” loops caring out different mathematical operations (e.g. solve linear systems, hydrodynamics matrix operations, etc.) 24 “do” loops caring out different mathematical operations (e.g. solve linear systems, hydrodynamics matrix operations, etc.) Linpack:Linpack: performs numerical linear algebra performs numerical linear algebra

22 Examples of benchmark programs  SPEC - Standard Performance Evaluation Corporation a non-profit international organization focused on developing standard tools for measuring the performance of computer systems a non-profit international organization focused on developing standard tools for measuring the performance of computer systems develops standard sets of benchmarks based on real applications develops standard sets of benchmarks based on real applications benchmark sets contain source codes benchmark sets contain source codes there are also tools for generating performance reports there are also tools for generating performance reports

23 Examples of benchmark programs  Evolution of SPEC benchmark standards: SPEC89 SPEC89 The first benchmark set, released in 1989The first benchmark set, released in 1989 benchmark value: geometric mean of execution times normalized to the VAX ‑ 11/780 computerbenchmark value: geometric mean of execution times normalized to the VAX ‑ 11/780 computer SPEC92 SPEC92 contains different benchmarks for integer (SPECINT) and floating ‑ point instructions (SPECFP)contains different benchmarks for integer (SPECINT) and floating ‑ point instructions (SPECFP) CPU95, CPU2000 CPU95, CPU2000 Current version: CPU2006 Current version: CPU2006 Next version: CPUv6 Next version: CPUv6  SPEC consists of three interest groups Open Systems Group (OSG): Component and system level benchmarks Open Systems Group (OSG): Component and system level benchmarks High Performance Group (HPG): Benchmarks for high-performance computing High Performance Group (HPG): Benchmarks for high-performance computing Graphics Performance Characterization Group (GPCG): Benchmarks for graphics subsystems Graphics Performance Characterization Group (GPCG): Benchmarks for graphics subsystems

24 Examples of benchmark programs  Details for CPU2006: contains two collections: contains two collections: CINT2006: integer computationsCINT2006: integer computations CFP2006: floating-point computationsCFP2006: floating-point computations it can measure: it can measure: speed: SPEC ratio - the time to execute one copy of the benchmarkspeed: SPEC ratio - the time to execute one copy of the benchmark rate: SPEC rate - the number of jobs that can be executed in a given time (e.g. 24h)rate: SPEC rate - the number of jobs that can be executed in a given time (e.g. 24h) results are combined with geometric mean results are combined with geometric mean normalization is made on a Sun Microsystems Ultra 5/10 workstation, with a SPARC processor; for this system the result of the measurement is 1 normalization is made on a Sun Microsystems Ultra 5/10 workstation, with a SPARC processor; for this system the result of the measurement is 1

25 Details for CPU2006  Examples of integer benchmarks 401.bzip2: compression program based on bzip2 401.bzip2: compression program based on bzip2 403.gcc: C compiler based on gcc gcc: C compiler based on gcc gobmk: plays the game of go 445.gobmk: plays the game of go 458.sjeng: chess program 458.sjeng: chess program 462.libquantum: library for the simulation of a quantum computer 462.libquantum: library for the simulation of a quantum computer 473.astar: path-finding library for 2D maps (A* algorithm) 473.astar: path-finding library for 2D maps (A* algorithm)

26 Details for CPU2006  Example floating-point benchmarks 435.gromacs: simulates the Newtonian equations of motion for particles 435.gromacs: simulates the Newtonian equations of motion for particles 444.namd: simulates bio-molecular systems 444.namd: simulates bio-molecular systems 459.GemsFDTD: solves the Maxwell equations in 3D in the time domain 459.GemsFDTD: solves the Maxwell equations in 3D in the time domain 465.tonto: quantum chemistry package 465.tonto: quantum chemistry package 481.wrf: weather forecasting 481.wrf: weather forecasting 482.sphinx3: speech recognition 482.sphinx3: speech recognition  look on the Internet for the results of your processor