Presentation on theme: "Systolic Arrays & Their Applications"— Presentation transcript:
1 Systolic Arrays & Their Applications By: Jonathan Break
2 Overview What it is N-body problem Matrix multiplication Cannon’s methodOther applicationsConclusion
3 What Is a Systolic Array? A systolic array is an arrangement of processors in an arraywhere data flows synchronously across the array betweenneighbors, usually with different data flowing in different directionsEach processor at each step takes in data from one or moreneighbors (e.g. North and West), processes it and, in the nextstep, outputs results in the opposite direction (South and East).H. T. Kung and Charles Leiserson were the first to publisha paper on systolic arrays in 1978, and coined the name.
4 What Is a Systolic Array? A specialized form of parallel computing.Multiple processors connected by short wires.Unlike many forms of parallelism which lose speed through their connection.Cells(processors), compute data and store it independently of each other.
5 Systolic Unit(cell) Each unit is an independent processor. Every processor has some registers and an ALU.The cells share information with their neighbors, after performing the needed operations on the data.
7 N-body Problem Conventional method: N2 Systolic method: N We will need one processing element for each body, lets call them planets.
8 B2B3B1B4B6B5Six planets in orbit with different masses, and different forcesacting on each other, so are systolic array will look like this.
9 Each processor will hold the newtonian force formula: Fij = kmimj/d2ij , k being the gravitational constant.Then load the array with values.B1B2B3B4B5B6Now that the array has the mass and the coordinates ofEach body we can begin are computation.
17 Matrix Multiplication a11 a12 a13a21 a22 a23a31 a32 a33b11 b12 b13b21 b22 b23b31 b32 b33c11 c12 c13c21 c22 c23c31 c32 c33*=Conventional Method: N3For I = 1 to NFor J = 1 to NFor K = 1 to NC[I,J] = C[I,J] + A[J,K] * B[K,J];
18 Systolic Method This will run in O(n) time! To run in N time we need N x N processing units, in this casewe need 9.P1P2P3P4P5P6P7P8P9
19 We need to modify the input data, like so: a13 a12 a11a23 a22 a21a33 a32 a31Flip columns 1 & 3b31 b32 b33b21 b22 b23b11 b12 b13Flip rows 1 & 3and finally stagger the data sets for input.
20 b33b23b13b32b22b12b31b21b11a13 a12 a11P1P2P3a23 a22 a21P4P5P6a33 a32 a31P7P8P9At every tick of the global system clock data is passed to eachprocessor from two different directions, then it is multipliedand the result is saved in a register.
21 *=Lets try this using a systolic array.53225432P1P2P3P4P5P6P7P8P9
29 23 36 28 25 39 34 32 37 Same answer! In 2n + 1 time, can we do better? The answer is yes, there is an optimization.233628253934283237P1P2P3P4P5P6P7P8P92336282539343237
30 Cannon’s Technique No processor is idle. Instead of feeding the data, it is cycled.Data is staggered, but slightly different than before.Not including the loading which can be done in parallel for a time of 1, this technique is effectively N.
31 Other Applications Signal processing Image processing Solutions of differential equationsGraph algorithmsBiological sequence comparisonOther computationally intensive tasks
32 Samba: Systolic Accelerator for Molecular Biological Applications This systolic array contains 128 processors shared into 32 full customVLSI chips. One chip houses 4 processors, and one processor performs10 millions matrix cells per second.
33 Why Systolic? Extremely fast. Easily scalable architecture. Can do many tasks single processor machines cannot attain.Turns some exponential problems into linear or polynomial time.
34 Why Not Systolic? Expensive. Not needed on most applications, they are a highly specialized processor type.Difficult to implement and build.
35 Summary What a systolic array is. Step through of matrix multiplication.Cannon’s optimization.Other applications including samba.Positives and negatives of systolic arrays.
36 References“Systolic Algorithms & Architectures” by Patrice Quinton and Yves Robert, 1991 Prentice Hall International“The New Turing Omnibus” by A.K. Dewdney, New York