Download presentation

Presentation is loading. Please wait.

1
**Raspberry Pi Performance Benchmarking**

Justin Moore Salish Kootenai College

2
**Overview Raspberry Pi Cluster Build Performance**

Performance Benchmark Tools Tuning Analysis Conclusion

3
**Raspberry Pi Cluster – Model B**

CPU – Broadcom ARM11 76JZF-S 700MHz Can overclock to 1000MHz RAM – 512MB 448MB/64MB – CPU/GPU Linux OS 10/100 BaseT Ethernet port Size of a credit card Price - $35 Image:

4
**Raspberry Pi Cluster - Setup**

Router – Western Digital N750 NFS mounted external hard drive – 500GB Buffalo Inc. 4 Pi cluster USB hub – Manhattan 7 port USB 2.0 SD card – Kodak 8GB ~$200 Transition to next slide with “After the cluster was set up, we wanted to measure performance”

5
**Performance – Floating Point Operations**

How we measure performance Busy CPU? Speed? What is a floating point operation (FLOP)? Arithmetic operation Formats Single Double Performance measurement by FLOPS How do we measure FLOPS? General Matrix Multiplication (GEMM) High computational intensity with an increase in matrix size

6
**Performance Benchmarking**

Single Raspberry Pi BLAS - Basic Linear Algebra Subprograms ATLAS - Automatically Tuned Linear Algebra Software Auto tunes BLAS for any system Raspberry Pi Cluster MPI - Message Passing Interface Standard API for inter-process communication Facilitates parallel programming MPICH p1 HPL - High Performance LINPACK Tuned MPI Combined with ATLAS Wrote Custom code ATLAS Added parallel capability Compared with HPL

7
**MyGEMM – Naïve Implementation**

= * C (i,j) A(i,k) B (k,j) Where N = Matrix Size For i = 1 to N For j = 1 to N For k = 1 to N C(i,j)=C(i,j)+A(i,k)*B(k,j) striding bad! ----- Meeting Notes (7/30/14 13:38) ----- Transistion to benchmarking slide

8
**MyGEMM – Naïve Pitfalls**

GEMM – Naïve method inefficient Two whole matrices are loaded in to memory Cache is not used efficiently Strides through the matrix If not Naïve, then what? Naïve

9
**MyGEMM – Software Tuning**

What is block matrix multiplication Matrix is split into smaller matrices/blocks Shrinks matrix size to allow both A & B into fast memory A11 A12 A13 A14 A21 A22 A23 A24 Reduces striding by fully utilizing memory hierarchy Transistion After we tuned for a single node we scaled it to our cluster a11 a12 A11 A31 A32 A33 A34 a21 a22 A41 A42 A43 A44

10
**MyGEMM – Cluster Software Tuning**

How do we distribute the matrix multiplication? MPI used to distribute blocks to nodes MyGEMM allows for experimentation on block size A11 A12 A13 A14 Node 1 A21 A22 A23 A24 Node 2 A31 A32 A33 A34 Start with further division of matrix Why to divide matrix? How do I reduce matrix size? Divided by number of nodes in the cluster allows matrix multiplication on higher sizes What is cache blocking? reduce the matrix to the exact size of my cache Why blocks? Row/Column wise multiplication Loop unrolling ----- Meeting Notes (7/28/14 13:54) ----- reduces matrix to fit the cache block size is reduced to fit in cache ----- Meeting Notes (7/30/14 08:29) ----- remove bullet 1 add mygemm parallel talk how to distribute the matrix A41 Node 3 A42 A43 A44 Node 4

11
Hardware Tuning Raspberry Pi allows CPU overclocking and memory sharing between CPU and GPU Memory Memory is shared between CPU and GPU 512MB total onboard memory Up to 496MB can be used for CPU More memory = larger matrices CPU Clock Up to 1000 MHz from 700MHz

18
**Conclusion Performance is dependent on key factors Top 500 – 1993**

Matrix Size Block Size RAM size CPU speed Top 500 – 1993 T#292 Tied with General Motors ----- Meeting Notes (7/30/14 16:07) ----- move after yellowstone

19
**Conclusion – Pi Cluster vs. Yellowstone**

ARM11, 32 bit, Double Precision Yellowstone Sandy Bridge Xeon, 64 bit, Double Precision Power 15.6 Watts 1.4 MWatts Performance 836 MFLOPS 1.2 PFLOPS Price $250.00 $22,500,000.00 MFLOPS/W 54.8 875.3 MFLOPS/$ 3.4 57.2

20
**Thank You Dr. Richard Loft Raghu Raj Prasanna Kumar Amogh Simha**

Stephanie Barr

21
Questions?

Similar presentations

OK

1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Animated ppt on magnetism video Ppt on swami vivekananda teachings Ppt on demographic structure of neighbourhood Ppt on dc motor working Ppt on acid-base indicators in everyday life Ppt on intellectual property rights and global business Ppt on the coordinate plane Download ppt on civil disobedience movement martin Ppt on power diode ppt Ppt on 1984 riots in india