Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 1 Part VII Advanced Architectures.

Slides:

Advertisements

Similar presentations

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,

Advertisements

PIPELINE AND VECTOR PROCESSING

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

Lecture 6: Multicore Systems

The University of Adelaide, School of Computer Science

Parallel computer architecture classification

Fall EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.

An Analysis of SIMD Instructions in the Pentium III Microprocessor By Alexander J. Aved 05 DEC 2000 CS689 Ball State University Muncie, Indiana.

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

1 Microprocessor-based Systems Course 4 - Microprocessors.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 1 Part VII Advanced Architectures.

Chapter 12 Three System Examples The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.

CS 300 – Lecture 23 Intro to Computer Architecture / Assembly Language Virtual Memory Pipelining.

July 2004Computer Architecture, Advanced ArchitecturesSlide 1 Part VII Advanced Architectures.

RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.

PSU CS 106 Computing Fundamentals II Introduction HM 1/3/2009.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

7-Aug-15 (1) CSC Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32.

CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.

Advanced Computer Architectures

Computer performance.

Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.

Parallelism Processing more than one instruction at a time. Pipelining

Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.

Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.

1 Chapter 04 Authors: John Hennessy & David Patterson.

Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.

Multi-core architectures. Single-core computer Single-core CPU chip.

Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.

Computer Architecture, Advanced Architectures Part VII Advanced Architectures.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

History of Microprocessor MPIntroductionData BusAddress Bus

Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

Classic Model of Parallel Processing

1 Processor Architecture Jurij Silc, Borut Robic, Theo Ungerer.

Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.

Introduction to MMX, XMM, SSE and SSE2 Technology

Pipelining and Parallelism Mark Staveley

Computer performance issues* Pipelines, Parallelism. Process and Threads.

EKT303/4 Superscalar vs Super-pipelined.

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.

Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.

CS203 – Advanced Computer Architecture Performance Evaluation.

Topics to be covered Instruction Execution Characteristics

Parallel Processing - introduction

Modern Processor Design: Superscalar and Superpipelining

Morgan Kaufmann Publishers

Part V Memory System Design

CS775: Computer Architecture

Special Instructions for Graphics and Multi-Media

STUDY AND IMPLEMENTATION

Coe818 Advanced Computer Architecture

Alex Saify Chad Reynolds James Aldorisio Brian Bischoff

Computer Evolution and Performance

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

Performance of computer systems

CS 286 Computer Organization and Architecture

COMPUTER ORGANIZATION AND ARCHITECTURE

Presentation transcript:

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 1 Part VII Advanced Architectures

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 2 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture: From Microprocessors to Supercomputers, Oxford University Press, 2005, ISBN X. It is updated regularly by the author as part of his teaching of the upper-division course ECE 154, Introduction to Computer Architecture, at the University of California, Santa Barbara. Instructors can use these slides freely in classroom teaching and for other educational purposes. Any other use is strictly prohibited. © Behrooz Parhami EditionReleasedRevised First July 2003July 2004July 2005Mar. 2007Feb. 2011* * Minimal update, due to this part not being used for lectures in ECE 154 at UCSB

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 3 VII Advanced Architectures Topics in This Part Chapter 25 Road to Higher Performance Performance enhancement beyond what we have seen: What else can we do at the instruction execution level? Data parallelism: vector and array processing Control parallelism: parallel and distributed processing

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 4 25 Road to Higher Performance Review past, current, and future architectural trends: General-purpose and special-purpose acceleration Introduction to data and control parallelism Topics in This Chapter 25.1 Past and Current Performance Trends 25.2 Performance-Driven ISA Extensions 25.3 Instruction-Level Parallelism 25.4 Speculation and Value Prediction 25.5 Special-Purpose Hardware Accelerators 25.6 Vector, Array, and Parallel Processing

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide Past and Current Performance Trends 0.06 MIPS (4-bit processor) Intel 4004: The first  p (1971) Intel Pentium 4, circa ,000 MIPS (32-bit processor) bit bit Pentium, MMX Pentium Pro, II 32-bit Pentium III, M Celeron

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 6 Architectural Innovations for Improved Performance Architectural method Improvement factor 1. Pipelining (and superpipelining)3-8 √ 2. Cache memory, 2-3 levels2-5 √ 3. RISC and related ideas 2-3 √ 4. Multiple instruction issue (superscalar)2-3 √ 5. ISA extensions (e.g., for multimedia) 1-3 √ 6. Multithreading (super-, hyper-) 2-5 ? 7. Speculation and value prediction 2-3 ? 8. Hardware acceleration2-10 ? 9. Vector and array processing2-10 ? 10. Parallel/distributed computing s ? Established methods Newer methods Previously discussed Covered in Part VII Available computing power ca. 2000: GFLOPS on desktop TFLOPS in supercomputer center PFLOPS on drawing board Computer performance grew by a factor of about between 1980 and due to faster technology 100 due to better architecture

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 7 Peak Performance of Supercomputers PFLOPS TFLOPS GFLOPS Earth Simulator ASCI White Pacific ASCI Red Cray T3D TMC CM-5 TMC CM-2 Cray X-MP Cray 2  10 / 5 years Dongarra, J., “Trends in High Performance Computing,” Computer J., Vol. 47, No. 4, pp , [Dong04]

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide Performance-Driven ISA Extensions Adding instructions that do more work per cycle Shift-add: replace two instructions with one (e.g., multiply by 5) Multiply-add: replace two instructions with one (x := c + a  b) Multiply-accumulate: reduce round-off error (s := s + a  b) Conditional copy: to avoid some branches (e.g., in if-then-else) Subword parallelism (for multimedia applications) Intel MMX: multimedia extension 64-bit registers can hold multiple integer operands Intel SSE: Streaming SIMD extension 128-bit registers can hold several floating-point operands

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide Instruction-Level Parallelism Figure 25.4 Available instruction-level parallelism and the speedup due to multiple instruction issue in superscalar processors [John91].

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 10 Instruction-Level Parallelism Figure 25.5 A computation with inherent instruction-level parallelism.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 11 VLIW and EPIC Architectures Figure 25.6 Hardware organization for IA-64. General and floating- point registers are 64-bit wide. Predicates are single-bit registers. VLIWVery long instruction word architecture EPICExplicitly parallel instruction computing

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide Speculation and Value Prediction Figure 25.7 Examples of software speculation in IA-64.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 13 Value Prediction Figure 25.8 Value prediction for multiplication or division via a memo table.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide Special-Purpose Hardware Accelerators Figure 25.9 General structure of a processor with configurable hardware accelerators.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 15 Graphic Processors, Network Processors, etc. Figure Simplified block diagram of Toaster2, Cisco Systems’ network processor. PE 5

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide Vector, Array, and Parallel Processing Figure The Flynn-Johnson classification of computer systems.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 17 SIMD Architectures Data parallelism: executing one operation on multiple data streams Concurrency in time – vector processing Concurrency in space – array processing Example to provide context Multiplying a coefficient vector by a data vector (e.g., in filtering) y[i] := c[i]  x[i], 0  i < n Sources of performance improvement in vector processing One instruction is fetched and decoded for the entire operation The multiplications are known to be independent (no checking) Pipelining/concurrency in memory access as well as in arithmetic Array processing is similar

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide Vector Processor Implementation Figure 26.1 Simplified generic structure of a vector processor.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 19 Example Array Processor Figure 26.7 Array processor with 2D torus interprocessor communication network.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 20 Parallel Processing as a Topic of Study Graduate course ECE 254B: Adv. Computer Architecture – Parallel Processing An important area of study that allows us to overcome fundamental speed limits Our treatment of the topic is quite brief (Chapters 26-27)

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 21 MIMD Architectures Control parallelism: executing several instruction streams in parallel GMSV: Shared global memory – symmetric multiprocessors DMSV: Shared distributed memory – asymmetric multiprocessors DMMP: Message passing – multicomputers Figure 27.1 Centralized shared memory.Figure 28.1 Distributed memory....

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide Implementing Symmetric Multiprocessors Figure 27.6 Structure of a generic bus-based symmetric multiprocessor.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide Implementing Asymmetric Multiprocessors Figure Structure of a ring-based distributed-memory multiprocessor.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide Communication by Message Passing Figure 28.1 Structure of a distributed multicomputer.

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 25 Computing in the Cloud Image from Wikipedia Computational resources, both hardware and software, are provided by, and managed within, the cloud Users pay a fee for access Managing / upgrading is much more efficient in large, centralized facilities (warehouse-sized data centers or server farms) This is a natural continuation of the outsourcing trend for special services, so that companies can focus their energies on their main business

Feb. 2011Computer Architecture, Advanced ArchitecturesSlide 26 Warehouse-Sized Data Centers Image from IEEE Spectrum, June 2009