Presentation is loading. Please wait.

Presentation is loading. Please wait.

A study on SIMD architecture

Similar presentations


Presentation on theme: "A study on SIMD architecture"— Presentation transcript:

1 A study on SIMD architecture
CDA Group Project Presentation - Spring 2013 Mohammad Ahmadian Gurkan Solmaz Rouhollah Rahmatizadeh

2 Outline Introduction to SIMD & brief history SIMD architecture
Matrix multiplication AES Implementation & results Conclusion

3 Introduction to SIMD Single instruction, multiple data
To give the audience more information:

4 Introduction to SIMD SIMD processors:
has a single Control Unit reading instructions, decoding and sending ctrl signals to the PEs data are supplied to PEs by a memory # of data paths = # of PEs Interconnecting Network provides flexibility for data to and from the PEs IO system converts the data format To give the audience more information:

5 Introduction to SIMD To give the audience more information:

6 Brief history First use in vector supercomputers, early 1970s (e.g. CDC Star 100, TI ASC) Vector proc. became popular by Cray (1970s,1980s) The first modern SIMD machines: massively parallel processing-style supercomputers Thinking Machines CM-1 and CM-2 The current era in the desktop-computers rather than the supercomputers Desktop processors are powerful enough to support real-time gaming, video processing To give the audience more information:

7 Brief history Sun Microsystems introduced SIMD integer instructions in VIS (visual instruction set) extensions in UltraSPARC I microprocessor (1995) MIPS introduced MDMX (MIPS Digital Media eXtension) Intel's MMX extensions to the x86 architecture (1996) AltiVec system in the Motorola PowerPC's, IBM's POWER systems Intel`s SSE system(streaming SIMD extensions) Sony's Playstation2 and Motorola's MPC7400

8 SIMD architecture Exploits a property of data
stream called "data paralelism" SIMD computing is also known as vector processing Programs are written for SISD machines, and include SIMD instructions Length of vectors the # of elements of a given data type (128-bit vector to do four-way single-precision floating-point)

9 SIMD architecture One obvious op type, intra-element arithmetic (e.g. addition) and non-arithmetic (e.g. AND, XOR) The other type, interelement arithmetic between the elements of a single vector (e.g. vector permutes, logical shifts)

10 SIMD architecture - AltiVec
AltiVec has 4 distinct registers: 2 source to hold operands, 1 filter/modifier and 1 destination to hold the result source: VA , VB, filter/modifier: VC dest: VT

11 SIMD architecture - Intel MMX/SSE
Intel added an extra 8, 128-bit registers for SSE PIII can dispatch a 64-bit add and a 64-bit multiply at the same time

12 Main idea Using SIMD instruction set for improving programs performance. Problem: There are not any SIMD implemented benchmark application for assessing performance. Solution: Create your own tools first.

13 Benchmarks Matrix Multiplication AES Encryption Algorithm
It is best candidate to implement by SIMD instruction, because it dealt with array of 2 data AES Encryption Algorithm It is other candidate to implement by SIMD instruction, and it has vast application ranging from mobile device to distributed data centers. Improvement of AES has effect also on several fields

14 SIMD operations

15 Matrix Multiplication
Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations.

16 Matrix Multiplication (cont.)

17 Matrix Multiplication (cont.)
With SIMD instructions we could do it with 3 vector instructions

18 Matrix Multiplication (cont.)
We just Apply 4 Mulps instruction the we have

19 AES Encryption The Advanced Encryption Standard (AES) is a specification for the encryption of electronic data established by the U.S. National Institute of Standards and Technology (NIST) in 2001. Fast Encryption algorithm is desired in several fields as well as mobile device, ecommerce, cloud networks, ….

20 AES (Advance Encryption Standard)

21 SIMD instructions Memory and initialization
Load: __m128 _mm_loadu_ps(float *p); Set: __m128 _mm_set_ps(float z, float y, float x, float w); Store: void _mm_store_ps(float *p, __m128 a ); Integer/Floating point intrinsics Arithmetic: __m128 _mm_add_ps(__m128 a , __m128 b ); Logical: __m128 _mm_and_ps(__m128 a , __m128 b ); Shift: __m128i _mm_slli_si128 (__m128i a, int imm); Conversion: int _mm_cvtsi128_si32 (__m128i a); Comparison: __m128i _mm_cmpeq_epi8 (__m128i a, __m128i b); Miscellaneous: int _mm_extract_epi16 (__m128i a, int imm);

22 Array addition implementation
Original code: SIMD optimized: for(size_t i = 0; i < N; i++) { C[i] = A[i] + B[i]; } for(int i = 0; i < N; i+=4) { __m128 a = _mm_loadu_ps(A + i); __m128 b = _mm_loadu_ps(B + i); __m128 c = _mm_add_ps(a, b); _mm_storeu_ps(C + i, c); }

23 Implementation challenges
Compiler automatic vectorization Microsoft visual studio does not support Intel compilers support Manually optimization gives the best result.

24 The results Matrix multiplication: 2 times faster
AES encryption: 23% faster

25 Conclusion In this project we studied SIMD architecture.
SIMD is very useful in some applications. Not all algorithms can be vectorized. It needs human labor. The modern processors will reshape the way of thinking about programming.

26 References David A. Patterson and John L. Hennessey, "Computer Organization and Design: the Hardware/Software Interface", 1998, p.751 Bertil Svensson, "SIMD Processor Array Architectures". Jon Stokes. "SIMD Architectures." Ars Technica. N.p., Web. 15 Apr "SIMD." Wikipedia. Wikimedia Foundation, 15 Apr Web. Intel® Advanced Vector Extensions Programming Reference, 2009. AP-930 Streaming SIMD Extensions - Matrix Multiplication, Intel Co The Animation of AES is borrowed from Enrique Zabala

27 Question & answer Q: Why SIMD is faster than the naïve implementations in vector operations? A: Because in vector processing single instructions need to be executed on multiple data.


Download ppt "A study on SIMD architecture"

Similar presentations


Ads by Google