Systolic Computing Fundamentals. This is a form of pipelining, sometimes in more than one dimension. Machines have been constructed based on this principle,

Slides:



Advertisements
Similar presentations
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Advertisements

Systolic Arrays & Their Applications
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Why Systolic Architecture ?. Motivation & Introduction We need a high-performance, special-purpose computer system to meet specific application. I/O and.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
System Development. Numerical Techniques for Matrix Inversion.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Parallel Architectures: Topologies Heiko Schröder, 2003.
1 Introduction to Data Parallel Architectures Sima, Fountain and Kacsuk Chapter 10 CSE462.
Parallel Architectures: Topologies Heiko Schröder, 2003.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Why Systolic Architecture ? VLSI Signal Processing 台灣大學電機系 吳安宇.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Data Parallel Algorithms Presented By: M.Mohsin Butt
ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures.
Applications of Systolic Array FTR, IIR filtering, and 1-D convolution. 2-D convolution and correlation. Discrete Furier transform Interpolation 1-D and.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Supporting Systolic and Memory Communication in iWarp (Borkar et al. 1990) presented by Vasily Volkov CS258, Spring 2008, UC Berkeley.

Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Lecture 12 Today’s topics –CPU basics Registers ALU Control Unit –The bus –Clocks –Input/output subsystem 1.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Introduction to Convolution circuits synthesis image processing, speech processing, DSP, polynomial multiplication in robot control. convolution.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
Invitation to Computer Science 5th Edition
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Paper Review: XiSystem - A Reconfigurable Processor and System
Chapter One Introduction to Pipelined Processors.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Microcontroller Presented by Hasnain Heickal (07), Sabbir Ahmed(08) and Zakia Afroze Abedin(19)
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Basic Sequential Components CT101 – Computing Systems Organization.
Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
EE3A1 Computer Hardware and Digital Design
Chapter 4 MARIE: An Introduction to a Simple Computer.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
Outline Why this subject? What is High Performance Computing?
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #12 – Systolic.
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
Fundamentals of Programming Languages-II
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
VLSI SP Course 2001 台大電機吳安宇 1 Why Systolic Architecture ? H. T. Kung Carnegie-Mellon University.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Sequential Logic Design
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Pipelining and Retiming 1
Parallel Programming in C with MPI and OpenMP
Pipelining and Vector Processing
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Why systolic architectures?
Presentation transcript:

Systolic Computing Fundamentals

This is a form of pipelining, sometimes in more than one dimension. Machines have been constructed based on this principle, notable the iWARP, fabricated by Intel. What are Systolic Arrays?

Laying out algorithms in VLSI’‘Laying out algorithms in VLSI’  efficient use of hardware  not general purpose  not suitable for large I/O bound applications  control and data flow must be regular  The idea is to exploit VLSI efficiently by laying out algorithms (and hence architectures) in 2-D (not all systolic machines are 2-D, but probably most are)  Simple cells  Each cell performs one operation  (usually)

What is Systolic Computing? Definition 1.  sys·to·le (sîs¹te-lê) noun  The rhythmic contraction of the heart, especially of the ventricles, by which blood is driven through the aorta and pulmonary artery after each dilation or diastole.  [Greek sustolê, contraction, from sustellein, to contract. See systaltic.]  — sys·tol¹ic (sî-stòl¹îk) adjective  American Heritage Dictionary Definition 2. Data flows from memory in a rhythmic fashion, passing through many processing elements before it returns to memory. H.T.Kung The term “systolic” was first used in this context by H.T. Kung, then at CMU; it refers to the “pumping” action of a heart.

A set of simple processing elements with regular and local connections which takes external inputs and processes them in a predetermined manner in a pipelined fashion Definition 3. What is Systolic Computing?

Systolic’ is normally used to describe the regular pumping action of the heart By analogy, systolic computers pump data through The architectures thus produced are not general but tied to specific algorithms Systolic computers are pumping data in a regular way

This is good for computation-intensive tasks but not I/O-intensive tasks e.g. signal processing Most designs are simple and regular in order to keep the VLSI implementation costs low programs with simple data and control flow are best Systolic computers show both pipelining and parallel computation Systolic computers have both pipelining and parallelism

What is the difference between SIMD array and Systolic array? Mesh Type SIMD array Processing Units Processing Units Processing Units Interconnection Network(Local) Control Unit Data Bus Control Bus …….. An SIMD array is a synchronous array of PEs under the supervision of one control unit and all PEs receive the same instruction broadcast from the control unit but operate on different data sets from distinct data streams. SIMD array usually loads data into its local memories before starting the computation.

Systolic Array. SIMD array usually loads data into its local memories before starting the computation. Systolic arrays usually pipe data from an outside host and also pipe the results back to the host. Processing Units Interconnection Network(Local) …….. Control Unit Processing Units Control Unit Processing Units Control Unit

Host Station in Systolic Architecture without adding any burden to the I/OAs a result of the local-communication scheme, a systolic network is easily extended without adding any burden to the I/O. used

What are the Structures for Systolic Computing?

Systolic Systems increase computing power for some problems Systolic computers can be treated as a generalization of pipelined array architecture. The Basic Principle of a systolic system. PE Memory 100ns PE Memory 100ns PE-----PE 5 MOPS 30 MOPS NOTE: MOPS  Millions of Operations Per Second

What are the functions of a cell in a Systolic System? Systolic systems consists of an array of PE(Processing Elements) –processors are called cells, –each cell is connected to a small number of nearest neighbours in a mesh like topology. Each cell performs a sequence of operations on data that flows between them. Generally the operations are the same in each cell. Each cell performs an operation or small number of operations on a data item and then passes it to its neighbor. Systolic arrays compute in “lock-step” with each cell (processor) undertaking alternate compute/communicate phases.

What are the variations of systolic arrays? Systolic arrays can be built with variations in: –1. Connection Topology 2D Meshes hypercubes –2. Processor capability: ranging through: trivial- just an ALU ALU with several registers Simple CPU- registers, run own program Powerful CPU- local memory also

–3. Re-configurable Field programmable Gate Arrays (FPGAs) offer the possibility that re-programmable, re- configurable arrays can be constructed to efficiently compute certain problems. In general, FPGA technology is excellent for building small systolic array-style processors. Special purpose ALUs can be constructed and linked in a topology, to which the target application maps well. What are the variations of systolic arrays?

Regular Interconnect: why good?

What are typical structures of a Systolic Architecture? Example of systolic architecture: linear network Note the signals going in both directions!

Early systolic arrays are linear arrays and one dimensional(1D) or two dimensional I/O(2D). Most recently, systolic arrays are implemented as planar array with perimeter I/O to feed data through the boundary. Linear array with 1D I/O. This configuration is suitable for single I/O. Linear array with 2D I/O. It allows more control over linear array. 1D Linear Array 2D Linear Array What are typical structures of a Systolic Architecture? Some authors call it 1.5 dimensional architecture

Example of systolic network: Bi-directional two-dimensional network What are typical structures of a Systolic Architecture?

Planar array with perimeter I/O. This configuration allows I/O only through its boundary cells. Focal Plane array with 3D I/O. This configuration allows I/O to each systolic cell.

Example of systolic network: hexagonal network What are typical structures of a Systolic Architecture?

My experience story: Hypercubes in Intel What are typical structures of a Systolic Architecture? Example of systolic network: hypercubes  We add with respect to every variable of dimension. In the example above there are four variables

What are typical structures of a Systolic Architecture? Example of systolic network: trees This architecture can be used for maximum independent set and maximum clique problems in graph theory

3-d Array 4-d Array (mapped to 3-D) 3-D Hex Array 3-D Trees and Lattices What are typical structures of a Systolic Architecture? Regular Interconnect in 3D This is a research area of our group - look to Perkowski’s and Anas Al Rabadi’s papers

Matrix Inversion and Decomposition. Polynomial Evaluation. Convolution. Systolic arrays for matrix multiplication. Image Processing. Systolic lattice filters used for speech and seismic signal processing. Artificial neural network. Robotics (PSU) Equation Solving (PSU) Combinatorial Problems (PSU) What are the Applications Of Systolic Arrays? Good topics of a Master Thesis Discuss “General and Soldiers” and Symmetric function Evaluation Problems

Characteristics of Systolic Architectures

What are features of Systolic Arrays? A Systolic array is a computing network possessing the following features: –Synchrony, –Modularity, –Regularity, –Spatial locality, –Temporal locality, –Pipelinability, –Parallel computing. Synchrony means that the data is rhythmically computed (Timed by a global clock) and passed through the network. Modularity means that the array(Finite/Infinite) consists of modular processing units. Regularity means that the modular processing units are interconnected with homogeneously.

Spatial Locality means that the cells has a local communication interconnection. Temporal Locality means that the cells transmits the signals from from one cell to other which require at least one unit time delay. Pipelinability means that the array can achieve a high speed. What are features of Systolic Arrays?

It can be used for special purpose processing architecture because of –1. Simple and Regular Design. The systolic arrays has a regular and simple design (i.e) They are: – cost effective, –array is modular (i.e) adjustable to various performance goals, –large number of processors work together, –local communication in systolic array is advantageous for communication to be faster. 2. Concurrency and Communication. 3. Balancing Computation with I/O. What are the advantages of Systolic Architectures?

A systolic array is used as attached array processor, –it receives data and o/p the results through an attached host computer, –therefore the performance goal of array processor system is a computation rate that balances I/o bandwidth with host. With relatively low bandwidth of current I/O devices, to achieve a faster computation rate it is necessary to perform multiple computations per I/O access. Systolic arrays does this efficiently. How are the Systolic Processor attached to general architectures?

Effectively utilize VLSI Reduce “Von Neumann Bottleneck” Target compute-intensive applications Reduce design cost:  Simple  Regular Exploit Concurrency What are the advantages of Systolic Architectures?

Advantages: Using VLSI Effectively Replicate simple cells Local Communication ==>  Short wires  small delay  low clock skew  small drivers  less area  Scalable Small number of I/Os Routing costs dominate: power, area, and time!

Eliminating the Von Neuman’s Bottleneck Process each input multiple times. Keep partial results in the PEs. Does this still present a win today?  Large cost  Many registers

Balancing I/O and Computation Can’t go faster than the data arrives Reduce bandwidth requirements Choose applications well! Choose algorithms correctly!

Exploiting Concurrency Large number of simple PEs Manage without instruction store Methods:  Pipelining  SIMD/MIMD  Vector How severely?Limits application space. How severely?

1. Seth Copen Goldstein, CMU 2. David E. Culler, UC. Berkeley, Syeda Mohsina Afroze and other students of Advanced Logic Synthesis, ECE 572, 1999 and Seth Copen Goldstein, CMU 2. David E. Culler, UC. Berkeley, Syeda Mohsina Afroze and other students of Advanced Logic Synthesis, ECE 572, 1999 and Sources