1 Introduction to Data Parallel Architectures Sima, Fountain and Kacsuk Chapter 10 CSE462.

Slides:



Advertisements
Similar presentations
Computer Organization, Bus Structure
Advertisements

Parallel computer architecture classification
Fundamental of Computer Architecture By Panyayot Chaikan November 01, 2003.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
Parallel Architectures: Topologies Heiko Schröder, 2003.
VLIW Machines Sima, Fountain and Kacsuk Chapter 6 CSE3304.
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.
Parallel Architectures: Topologies Heiko Schröder, 2003.
PipeRench: A Coprocessor for Streaming Multimedia Acceleration Seth Goldstein, Herman Schmit et al. Carnegie Mellon University.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Why Systolic Architecture ? VLSI Signal Processing 台灣大學電機系 吳安宇.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
2. Multiprocessors Main Structures 2.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
Applications of Systolic Array FTR, IIR filtering, and 1-D convolution. 2-D convolution and correlation. Discrete Furier transform Interpolation 1-D and.
Efficient Associative SIMD Processing for Non-Tabular Data Jalpesh K. Chitalia and Robert A. Walker Computer Science Department Kent State University.
Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.

Systolic Computing Fundamentals. This is a form of pipelining, sometimes in more than one dimension. Machines have been constructed based on this principle,
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
4. Multiprocessors Main Structures 4.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
Introduction to Parallel Processing Ch. 12, Pg
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
Introduction MSCS 6060 – Parallel and Distributed Systems.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Introduction 9th January, 2006 CSL718 : Architecture of High Performance Systems.
Outline Classification ILP Architectures Data Parallel Architectures
Chapter One Introduction to Pipelined Processors.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 1 Introduction. Architecture & Organization 1 Architecture is those attributes visible to the programmer —Instruction set, number of bits used.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
10.Introduction to Data-Parallel architectures TECH Computer Science SIMD {Single Instruction Multiple Data} 10.1 Introduction 10.2 Connectivity 10.3 Alternative.
Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:
1 Introduction CEG 4131 Computer Architecture III Miodrag Bolic.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
COMPUTER SCIENCE Data Representation and Machine Concepts Section 2.1 Instructor: Lin Chen Sept 2013.
Parallel Computing.
Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006.
Vector and symbolic processors
Outline Why this subject? What is High Performance Computing?
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #12 – Systolic.
2/16/2016 Chapter Four Array Computers Index Objective understand the meaning and structure of array computer realize the associated instruction sets,
ECE 2110: Introduction to Digital Systems Chapter 6 Combinational Logic Design Practices Multiplexers.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
VLSI SP Course 2001 台大電機吳安宇 1 Why Systolic Architecture ? H. T. Kung Carnegie-Mellon University.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ read/write and clock inputs Sequence of control signal combinations.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
INTERCONNECTION NETWORK
Multiprocessor Systems
Parallel computer architecture classification
How does an SIMD computer work?
Computer Architecture Introduction to Data-Parallel architectures
Static Interconnection Networks
EE 4xx: Computer Architecture and Performance Programming
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Static Interconnection Networks
Presentation transcript:

1 Introduction to Data Parallel Architectures Sima, Fountain and Kacsuk Chapter 10 CSE462

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Basic Concept of Data Parallelism Memory Register 1Register 2 8 Bit ALU

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Basic Concept of Data Parallelism 8 bit wide Memory cells 8 bit wide Registers 8 bit ALU made of single bit ALUs

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Why is this useful? Every cell of a matrix Every pixel of an image Every record of an database Process:

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Thinking Machines l Connection Machine l Up to 65,535 processors in CM-2

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Connectivity l Want to support basic computations required at cell level –E.g. A[i,j] = (A[i-1,j] + A[i+1,j] + A[i,j-1] + A[i,j+1])/4 l To achieve this, cells can be connected in a variety of ways –Near neighbours –Tree –Graph –Pyramid –Hypercube –Multistage –Reconfigurable –Crossbar –Bus.

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Nearest Neighbours l Mapping spatially coherence data onto SIMD systems –Spatially correlated like images l Common to connect to NSEW, but diagonal has also been implemented –Applied to massively parallel systems –Scalable –Simple to implement

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Trees and Graphs l Problems expressed as graphs –E.g. database searching, model matching, expert systems, etc l No mathematically regular structure l Reconfigurability required l Binary an Quad trees common l Data bottlenecks going through roots of sub trees

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley The Pyramid l Combination of mesh and tree –Supports nearest neighbour plus (quad) tree communication –Local communications of mesh –Global communication of tree Consider example of moving data from one corner to another l Useful for data stored at multiple resolutions –Like images

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Hypercubes l 2 N processors – each of which has N links. l Fault tolerant l Shorter pathways than mesh

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Different Data Parallel Architectures l SIMD Program PE Array

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Different Data Parallel Architectures l Systolic or Pipelined DataALU 1ALU 2 ALU 3ALU 4 PE Array

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Different Data Parallel Architectures l Vectorizing Vector of Vectors Vector ALU Automatic Sequence

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Different Data Parallel Architectures l Associative and Neural Comparator Category Object Database

 David Abramson, 2004 Material from Sima, Fountain and Kacsuk, Addison Wesley Principal Characteristics of data- parallel systems PropertySIMDSystolicPipelineVectorizingNeuralAssociative ProgrammabilityGoodFixed GoodPoorGood AvailabilityGoodPoor GoodPoor ScalabilityGoodFixed Good ApplicabilityWideNarrow WideNarrowWide