Presentation is loading. Please wait.

Presentation is loading. Please wait.

The University of Adelaide, School of Computer Science

Similar presentations


Presentation on theme: "The University of Adelaide, School of Computer Science"— Presentation transcript:

1 The University of Adelaide, School of Computer Science
Computer Architecture A Quantitative Approach, Fifth Edition The University of Adelaide, School of Computer Science 11 January 2019 Chapter 1 Fundamentals of Quantitative Design and Analysis Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

2 The University of Adelaide, School of Computer Science
11 January 2019 Computer Technology Introduction Performance improvements: Improvements in semiconductor technology Feature size, clock speed Improvements in computer architectures Enabled by HLL compilers, UNIX Lead to RISC architectures Together have enabled: Lightweight computers Productivity-based managed/interpreted programming languages Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

3 Single Processor Performance
Introduction Move to multi-processor RISC Copyright © 2012, Elsevier Inc. All rights reserved.

4 Current Trends in Architecture
The University of Adelaide, School of Computer Science 11 January 2019 Current Trends in Architecture Introduction Cannot continue to leverage Instruction-Level parallelism (ILP) Single processor performance improvement ended in 2003 New models for performance: Data-level parallelism (DLP) Thread-level parallelism (TLP) Request-level parallelism (RLP) These require explicit restructuring of the application Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

5 The University of Adelaide, School of Computer Science
11 January 2019 Classes of Computers Classes of Computers Personal Mobile Device (PMD) e.g. start phones, tablet computers Emphasis on energy efficiency and real-time Desktop Computing Emphasis on price-performance Servers Emphasis on availability, scalability, throughput Clusters / Warehouse Scale Computers Used for “Software as a Service (SaaS)” Emphasis on availability and price-performance Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks Embedded Computers Emphasis: price Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

6 The University of Adelaide, School of Computer Science
11 January 2019 Parallelism Classes of Computers Classes of parallelism in applications: Data-Level Parallelism (DLP) Task-Level Parallelism (TLP) Classes of architectural parallelism: Instruction-Level Parallelism (ILP) Vector architectures/Graphic Processor Units (GPUs) Thread-Level Parallelism Request-Level Parallelism Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

7 The University of Adelaide, School of Computer Science
11 January 2019 Flynn’s Taxonomy Classes of Computers Single instruction stream, single data stream (SISD) Single instruction stream, multiple data streams (SIMD) Vector architectures Multimedia extensions Graphics processor units Multiple instruction streams, single data stream (MISD) No commercial implementation Multiple instruction streams, multiple data streams (MIMD) Tightly-coupled MIMD Loosely-coupled MIMD Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

8 Defining Computer Architecture
The University of Adelaide, School of Computer Science 11 January 2019 Defining Computer Architecture “Old” view of computer architecture: Instruction Set Architecture (ISA) design i.e. decisions regarding: registers, memory addressing, addressing modes, instruction operands, available operations, control flow instructions, instruction encoding “Real” computer architecture: Specific requirements of the target machine Design to maximize performance within constraints: cost, power, and availability Includes ISA, microarchitecture, hardware Defining Computer Architecture Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

9 The University of Adelaide, School of Computer Science
11 January 2019 Trends in Technology Trends in Technology Integrated circuit technology Transistor density: 35%/year Die size: %/year Integration overall: %/year DRAM capacity: %/year (slowing) Flash capacity: %/year 15-20X cheaper/bit than DRAM Magnetic disk technology: 40%/year 15-25X cheaper/bit then Flash X cheaper/bit than DRAM Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

10 The University of Adelaide, School of Computer Science
11 January 2019 Bandwidth and Latency Trends in Technology Bandwidth or throughput Total work done in a given time 10,000-25,000X improvement for processors X improvement for memory and disks Latency or response time Time between start and completion of an event 30-80X improvement for processors 6-8X improvement for memory and disks Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

11 Copyright © 2012, Elsevier Inc. All rights reserved.
Bandwidth and Latency Trends in Technology Log-log plot of bandwidth and latency milestones Copyright © 2012, Elsevier Inc. All rights reserved.

12 The University of Adelaide, School of Computer Science
11 January 2019 Transistors and Wires Trends in Technology Feature size Minimum size of transistor or wire in x or y dimension 10 microns in 1971 to .032 microns in 2011 Transistor performance scales linearly Wire delay does not improve with feature size! Integration density scales quadratically Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

13 The University of Adelaide, School of Computer Science
11 January 2019 Power and Energy Problem: Get power in, get power out Thermal Design Power (TDP) Characterizes sustained power consumption Used as target for power supply and cooling system Lower than peak power, higher than average power consumption Clock rate can be reduced dynamically to limit power consumption Energy per task is often a better measurement Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

14 Dynamic Energy and Power
The University of Adelaide, School of Computer Science 11 January 2019 Dynamic Energy and Power Dynamic energy Transistor switch from 0 -> 1 or 1 -> 0 ½ x Capacitive load x Voltage2 Dynamic power ½ x Capacitive load x Voltage2 x Frequency switched Reducing clock rate reduces power, not energy Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

15 The University of Adelaide, School of Computer Science
11 January 2019 Power Intel consumed ~ 2 W 3.3 GHz Intel Core i7 consumes 130 W Heat must be dissipated from 1.5 x 1.5 cm chip This is the limit of what can be cooled by air Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

16 The University of Adelaide, School of Computer Science
11 January 2019 Reducing Power Techniques for reducing power: Do nothing well Dynamic Voltage-Frequency Scaling Low power state for DRAM, disks Overclocking, turning off cores Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

17 The University of Adelaide, School of Computer Science
11 January 2019 Static Power Static power consumption Currentstatic x Voltage Scales with number of transistors To reduce: power gating Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

18 The University of Adelaide, School of Computer Science
11 January 2019 Trends in Cost Trends in Cost Cost driven down by learning curve Yield DRAM: price closely tracks cost Microprocessors: price depends on volume 10% less for each doubling of volume Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

19 Copyright © 2012, Elsevier Inc. All rights reserved.
Manufacturing ICs Yield: proportion of working dies per wafer Copyright © 2012, Elsevier Inc. All rights reserved.

20 Integrated Circuit Cost
The University of Adelaide, School of Computer Science 11 January 2019 Integrated Circuit Cost Trends in Cost Integrated circuit Bose-Einstein formula: Defects per unit area = defects per square cm (2010) N = process-complexity factor = (40 nm, 2010) Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

21 Copyright © 2012, Elsevier Inc. All rights reserved.
Intel Core i7 Wafer 300mm wafer, 280 chips, 32nm technology Each chip is 20.7 x 10.5 mm Copyright © 2012, Elsevier Inc. All rights reserved.

22 The University of Adelaide, School of Computer Science
11 January 2019 Dependability Dependability Module reliability Mean time to failure (MTTF) Mean time to repair (MTTR) Mean time between failures (MTBF) = MTTF + MTTR Availability = MTTF / MTBF Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

23 Copyright © 2012, Elsevier Inc. All rights reserved.
Defining Performance Copyright © 2012, Elsevier Inc. All rights reserved.

24 Understanding Performance
Algorithm Determines number of operations executed Programming language, compiler, architecture Determine number of machine instructions executed per operation Processor and memory system Determine how fast instructions are executed I/O system (including OS) Determines how fast I/O operations are executed Copyright © 2012, Elsevier Inc. All rights reserved.

25 Measuring Performance
The University of Adelaide, School of Computer Science 11 January 2019 Measuring Performance Typical performance metrics: Response time Throughput Speedup of X relative to Y Execution timeY / Execution timeX Execution time Wall clock time: includes all system overheads CPU time: only computation time Benchmarks Kernels (e.g. matrix multiply) Toy programs (e.g. sorting) Synthetic benchmarks (e.g. Dhrystone) Benchmark suites (e.g. SPEC06fp, TPC-C) Measuring Performance Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

26 Response Time and Throughput
How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/… per hour How are response time and throughput affected by Replacing the processor with a faster version? Adding more processors? We’ll focus on response time for now… Copyright © 2012, Elsevier Inc. All rights reserved.

27 Copyright © 2012, Elsevier Inc. All rights reserved.
Relative Performance Define Performance = 1/Execution Time “X is n time faster than Y” Example: time taken to run a program 10s on A, 15s on B Execution TimeB / Execution TimeA = 15s / 10s = 1.5 So A is 1.5 times faster than B Copyright © 2012, Elsevier Inc. All rights reserved.

28 Measuring Execution Time
Elapsed time Total response time, including all aspects Processing, I/O, OS overhead, idle time Determines system performance CPU time Time spent processing a given job Discounts I/O time, other jobs’ shares Comprises user CPU time and system CPU time Different programs are affected differently by CPU and system performance Copyright © 2012, Elsevier Inc. All rights reserved.

29 Copyright © 2012, Elsevier Inc. All rights reserved.
CPU Time Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count Copyright © 2012, Elsevier Inc. All rights reserved.

30 Principles of Computer Design
The University of Adelaide, School of Computer Science 11 January 2019 Principles of Computer Design Principles The Processor Performance Equation Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

31 Copyright © 2012, Elsevier Inc. All rights reserved.
CPU Time Example Computer A: 2GHz clock, 10s CPU time Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 × clock cycles How fast must Computer B clock be? Copyright © 2012, Elsevier Inc. All rights reserved.

32 Principles of Computer Design
The University of Adelaide, School of Computer Science 11 January 2019 Principles of Computer Design Principles Take Advantage of Parallelism e.g. multiple processors, disks, memory banks, pipelining, multiple functional units Principle of Locality Reuse of data and instructions Focus on the Common Case Amdahl’s Law Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

33 Instruction Count and CPI
Instruction Count for a program Determined by program, ISA and compiler Average cycles per instruction Determined by CPU hardware If different instructions have different CPI Average CPI affected by instruction mix Copyright © 2012, Elsevier Inc. All rights reserved.

34 Principles of Computer Design
The University of Adelaide, School of Computer Science 11 January 2019 Principles of Computer Design Principles Different instruction types having different CPIs Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

35 Copyright © 2012, Elsevier Inc. All rights reserved.
CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? A is faster… …by this much Copyright © 2012, Elsevier Inc. All rights reserved.

36 Copyright © 2012, Elsevier Inc. All rights reserved.
CPI in More Detail If different instruction classes take different numbers of cycles Weighted average CPI Relative frequency Copyright © 2012, Elsevier Inc. All rights reserved.

37 Copyright © 2012, Elsevier Inc. All rights reserved.
CPI Example Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 IC in sequence 2 4 Sequence 1: IC = 5 Clock Cycles = 2×1 + 1×2 + 2×3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 4×1 + 1×2 + 1×3 = 9 Avg. CPI = 9/6 = 1.5 Copyright © 2012, Elsevier Inc. All rights reserved.

38 Copyright © 2012, Elsevier Inc. All rights reserved.
Performance Summary Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc Copyright © 2012, Elsevier Inc. All rights reserved.


Download ppt "The University of Adelaide, School of Computer Science"

Similar presentations


Ads by Google