Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Number Crunching Fast Time to Market with Scala

Similar presentations


Presentation on theme: "Fast Number Crunching Fast Time to Market with Scala"— Presentation transcript:

1 Fast Number Crunching Fast Time to Market with Scala
2018/6/1 Fast Number Crunching Fast Time to Market with Scala By Richard Gomes

2 about me Richard Gomes Brazilian living in the UK since 2006
2018/6/1 about me Richard Gomes Brazilian living in the UK since 2006 Passion for Finance Special interest for High Performance Computing ( HPC ) I like photography, go karting and table tennis T: frgomes

3 Objectives High Performance Computing (HPC) with Scala
2018/6/1 Objectives High Performance Computing (HPC) with Scala Putting in context Thinking parallel How it works in C ( one slide! ) How it works in Scala Pros and Cons

4 Putting in Context What it is about
2018/6/1 Putting in Context What it is about parallelism and parallel arquitectures hundreds, thousands of processing elements ( PEs ) general purpose GPUs how do use GPUs with Scala What it is NOT about multithreading multiple core CPUs

5 2018/6/1 Putting in context Applicatibility of High Performance Computing ( HPC ) Geology : gas and oil prospection Meteorology : weather simulation Physics : fluid dynamics, high energy physics, ... Biology : protein structure, genoma sequencing Media : computer graphics Finance : price forecasting

6 Putting in Context Scala gaining momentum Language maturity
2018/6/1 Putting in Context Scala gaining momentum Language maturity Tooling maturity Performance improvements Parallel collections Recent tooling support for HPC 260+ positions in itjobswatch.co.uk in the last 12 months

7 Thinking Parallel Standard deviation
2018/6/1 Thinking Parallel Standard deviation float sum = 0; for (int i=0; i<n; i++) sum += cells[i]; float mean = sum / n; float sum = 0; for (int i=0; i<n; i++) sum += Math.sqr(cells[i] – mean); float stddev = Math.sqrt(sum / n);

8 Thinking Parallel Identify sequential code → big logical blocks
2018/6/1 Thinking Parallel Identify sequential code → big logical blocks Identify loops → candidates for execution in parallel Turn sequential code into parallel code Implement using parallel primitives Benchmarks process Design → Develop → Test → Tune

9 Thinking Parallel Identify sequential code Calculation of mean
2018/6/1 Thinking Parallel Identify sequential code Calculation of mean Calculation of stddev Identify loops One loop when mean is calculated One loop when stddev is calculate Turn sequential code into parallel code How loops could be performed in parallel?

10 2018/6/1 Thinking Parallel Let's suppose we have psum(), a parallel version of summation It was // calculate mean int n = cells.length; float mean = psum(cells) / n; // calculate stddev float sum = 0; for (int i=0; i<n; i++) sum += Math.sqr(cells[i] – mean); float stddev = Math.sqrt(sum / n); It now looks like // calculate mean int n = cells.length; float mean = psum(cells) / cells.length; // calculate stddev for (int i=0; i<n; i++) cells[i] += Math.sqr(cells[i] – mean); float sum = psum(cells); float stddev = Math.sqrt(sum / n);

11 Thinking Parallel in Scala
2018/6/1 Thinking Parallel in Scala // parallel sum def psum(cells: Array[Float]) : Float = cells.sum; def mean(cells: Array[Float]) : Float = { return psum(cells) / cells.length; } def f (cell: Float, mean: Float) : Float = { val x = cell – mean; return x * x; } def stddev(cells: Array[Float]) : Float = { return Math.sqrt( psum( cells.zip( f ) ) / n ); }

12 How it works in C/C++ ? Function f must be
2018/6/1 How it works in C/C++ ? Function f must be implemented as a kernel function copiled by a special purpose compiler uploaded into the GPU Data must be moved from the CPU into the GPU moved from the GPU into the CPU Code must be aware of GPU specs More info

13 How it works in Scala ? Introducing ScalaCL is a compiler plugin
2018/6/1 How it works in Scala ? Introducing ScalaCL is a compiler plugin provides byte code optimizations generates and compiles the kernel code for you handles kernel code uploading handles data transfers between the CPU and GPUs is a GPU-aware library introduces CLArray introduces CLCollection hierarchy

14 How it works in Scala ? ScalaCL benefits 100% Scala code
2018/6/1 How it works in Scala ? ScalaCL benefits 100% Scala code Hides GPU tooling details Hides implementation details Implements sequential and parallel collection interfaces Works well in Eclipse and IntelliJ

15 2018/6/1 How it works in Scala ? package org.squantlib.math.statistics import scala.math._ import scalacl._ class Stats { private implicit val context = Context.best // run on GPU def $mean(v : CLArray[Float]) : Float = v.sum / v.length def $variance(v : CLArray[Float], m : Float) : Float = { v.par.map(x => { (x - m) * (x - m) } ).sum / v.length } def $stddev(v : CLArray[Float], m : Float) : Float = { sqrt( $variance(v, m) ).asInstanceOf[Float] }

16 2018/6/1 How it works in Scala ? // interface with regular Array type def mean(v : Array[Float]) : Float = { $mean(v.cl) } def variance(v : Array[Float], m : Float) : Float = { $variance(v.cl, m) } def stddev(v : Array[Float], m : Float) : Float = { $stddev(v.cl, m) } }

17 How it works in Scala? Benchmarks Depend on CPU and GPU capabilities
2018/6/1 How it works in Scala? Benchmarks Depend on CPU and GPU capabilities Depend on the algorithm Depend on implementation techniques My benchmarks Easily: 10 faster With refinemends: something aroung 100 – 300 times faster Maximum: ~500 times faster

18 How it works in Scala Process Design → Develop → Test → Tune
2018/6/1 How it works in Scala Process Design → Develop → Test → Tune Strees testings : high volumes, 100+ reppetitions Build benchmarks Back to the design step Try parallel Collections Try sequential Collections Try alternative approaches and algorithms

19 Pros and cons of ScalaCL
2018/6/1 Pros and cons of ScalaCL Pros 100% Scala : no low level C or low level tooling Scala specific bytecode optimizations Excellent performance improvements Multiple approaches … in a fraction of time of C/C++ Cons Still incipient: may contain bugs Missing features Small community

20 2018/6/1 Thanks


Download ppt "Fast Number Crunching Fast Time to Market with Scala"

Similar presentations


Ads by Google