Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

Similar presentations


Presentation on theme: "Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix."— Presentation transcript:

1 Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Eng. M. Soliman Prof. S. Sedukhin

2 2 Contents The impacting factors on the processor architecture The idea of our proposed Trident processor The Trident parallelism The architecture of the Trident processor The features of the Trident processor Conclusion and Future work

3 3 Technology Applications Characteristics processor Architecture The Important Factors Impact on the Processor Architecture

4 4 Fast-improving Technology Moose's law: The number of transistors per integrated circuit would double every 18 months

5 5 Application Characteristics ProcessorMultimedia extension Intel Pentium II, III, and 4MMX, SSE, and SSE2 Motorola PowerPCAltiVec Silicon Graphics MIPSMDMX Sun SparcVIS Hewlett-Packard PA-RISCMAX In response to the increasing importance of multimedia applications, major processor vendors have announced extensions to their general purpose processors in an effort to improve their multimedia performance

6 6 The Idea of the Trident Processor The huge transistor budget (within a few years it will be possible to integrate a billion transistors on a single chip ) The requirements of future applications (the scientific and engineering applications, multimedia applications, …, are based on vector and matrix operations)

7 7 Scalar IS (1 operation) Vector IS (n operations) Matrix IS (n 2 /n 3 operations) We Propose the Trident Processor Trident: A general-purpose processor which has three instruction sets (IS): scalar, vector, and matrix

8 8 Ins. SetExampleScalar CodeScalar ops ScalarAddition z=x+y; 1 Vector Addition for(i=0;i<n;i++) z[i]=x[i]+y[i]; O(n)O(n) Dot product s=0; for(i=0;i<n;i++) s+=x[i]*y[i]; O(n)O(n) Addition for(i=0;i<n;i++) for(j=0;j<n;j++) z[i][j]=x[i][j]+ y[i][j]; O(n2)O(n2) Matrix Matrix-vector multiplication for(i=0;i<n;i++){ s=0; for(j=0;j<n;j++) s+=x[i][j]*y[j];z[i]=s;} O(n2)O(n2) Matrix-matrix multiplication for(i=0;i<n;i++) for(j=0;j<n;j++){ s=0; for(k=0;k<n;k++) s+=x[i][k]*y[k][j];z[i][j] =s;} O(n3)O(n3) The Trident Instruction sets

9 9 Trident processor exploits a significant amount (up to three levels) of data parallelism The advantages of using data parallelism Compact: A single short instruction can describe array of scalar operations Expressive: A single instruction can pass valuable information about an array of scalar operations to hardware Scalable: adding more hardware can increase performance by processing longer arrays The Trident Parallelism

10 10 The Trident Architecture

11 11 Vector Processing A vector pipeline can perform the fundamental vector operation, such as addition, subtraction, multiplication, and division Vector data are stored on ring vector registers Multiple vector instructions can be operated concurrently on the parallel vector pipelines

12 12 Step 0 Input a 0, b 0 Output 1 a !, b 1 a 0 + b 0 2 a 3, b 3 a 1 + b 1 3 a 3, b 3 a 2 + b 2 4 a 0, b 0 a 3 + b 3 VR2  VR0 + VR1 Example: vector addition

13 13 Matrix Processing By using parallel vector pipelines and ring matrix register file, the fundamental matrix operations, such as addition, subtraction, multiplication, and inversion, can be performed

14 14 Example: Matrix addition MR2  MR0 + MR1 P3P3 P2P2 P1P1 P0P0 P3P3 P2P2 P1P1 P0P0 OutputInput Step 0 a 00 b 00 a 10 b 10 a 20 b 20 a 30 b 30 1 a 01 b 01 a 11 b 11 a 21 b 21 a 31 b 31 a 00 + b 00 a 10 + b 10 a 20 + b 20 a 30 + b 30 2 a 02 b 02 a 12 b 12 a 22 b 22 a 32 b 32 a 01 + b 01 a 11 + b 11 a 21 + b 21 a 31 + b 31 3 a 03 b 03 a 13 b 13 a 23 b 23 a 33 b 33 a 02 + b 02 a 12 + b 12 a 22 + b 22 a 32 + b 32

15 15 The basic matrix operation is the matrix-matrix multiplication Matrix-matrix Multiplication

16 16 Chaining Matrix-matrix multiplication Matrix-vector multiplication Dot product

17 17 Instructions O(n3)O(n3)O(n2)O(n2)O(1) Load O(n3)O(n3)O(n3)O(n3)O(n2)O(n2) Store O(n2)O(n2)O(n2)O(n2)O(n2)O(n2) Mull-acc. O(n3)O(n3)O(n3)O(n3)O(n2)O(n2) Branch O(n3)O(n3)O(n2)O(n2)0 Address comp. O(n3)O(n3)O(n2)O(n2)O(1) Add/sub. O(n3)O(n3)O(n2)O(n2)0 Reg. initialization O(n2)O(n2)O(n)O(n)0 Scalar ISVector ISMatrix IS Matrix-matrix Multiplication Complexity

18 18 scalarvector matrix 8  8 Matrix-matrix Multiplication Number of instructions

19 19 scalar vector matrix (1) load, (2) store, (3) multiply-accumulate steps, (4) branch, (5) address computations, (6) addition/ subtraction, and (7) register initializations Continue

20 20 What this means? fewer instruction cache misses, fewer instruction fetches and decodes, fewer branches and fewer mispredicted branches, more predictable memory accesses, fewer hazards We can say that Trident code is compact code with powerful instructions for high performance

21 21 The Trident Processor Features The Trident processor consists mainly of datapath circuitry and register files The advances in the VLSI fabrication technology can be directly applied to support more parallelism Simple control unit There are many applications benefit from executing on the Trident processor, such as scientific, engineering, multimedia, and many others

22 22 Future Work Simulating the Trident processor Evaluating the performance of Trident processor on some multimedia and numerical applications Comparing the performance of Trident processor with the superscalar processors

23 23 Thank you


Download ppt "Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix."

Similar presentations


Ads by Google