Download presentation

1
**Parallel Computing in Matlab**

2
**PCT Parallel Computing Toolbox**

Offload work from one MATLAB session (the client) to other MATLAB sessions (the workers). Run as many as eight MATLAB workers (R2010b) on your local machine in addition to your MATLAB client session. 推荐一核不超过一个worker

3
**MDCS MATLAB Distributed Computing Server**

Run as many MATLAB workers on a remote cluster of computers as your licensing allows. Run workers on your client machine if you want to run more than eight local workers (R2010b). Scheduler/job manager: 专门负责任务分配。

4
MDCS installing

5
**Typical Use Cases Parallel for-Loops Batch Jobs Large Data Sets**

Many iterations Long iterations Batch Jobs Large Data Sets Batch Jobs When working interactively in a MATLAB session, you can offload work to a MATLAB worker session to run as a batch job. The command to perform this job is asynchronous, which means that your client MATLAB session is not blocked, and you can continue your own interactive session while the MATLAB worker is busy evaluating your code. The MATLAB worker can run either on the same machine as the client, or if using MATLAB Distributed Computing Server, on a remote cluster machine.

6
**Parfor Parallel for-loop Has the same basic concept with “for”.**

Parfor body is executed on the MATLAB client and workers. The necessary data on which parfor operates is sent from the client to workers, and the results are sent back to the client and pieced together. MATLAB workers evaluate iterations in no particular order, and independently of each other.

7
**Parfor A = zeros(1024, 1); for i = 1:1024 A(i) = sin(i*2*pi/1024); end**

plot(A) parallelization A = zeros(1024, 1); matlabpool open local 4 parfor i = 1:1024 A(i) = sin(i*2*pi/1024); end matlabpool close plot(A)

8
Timing A = zeros(n, 1); tic for i = 1:n A(i) = sin(i); end toc A = zeros(n, 1); matlabpool open local 8 tic parfor i = 1:n A(i) = sin(i); end toc n for parfor 10000

9
**When to Use Parfor? Each loop must be independent of other loops.**

Lots of iterations of simple calculations. or Long iterations. Small number of simple calculations.

10
**Classification of Variables**

broadcast variable sliced input variable loop variable reduction variable sliced output variable temporary variable Temporary variable: parfor结束后数据销毁 Loop variable: parfor结束后值为0； Sliced variable: 可对其进行并行操作。 Reduction variable: In a parfor-loop, the value of z is never transmitted from client to workers or from worker to worker. Rather, additions of i are done in each worker, with i ranging over the subset of 1:n being performed on that worker. The results are then transmitted back to the client, which adds the workers' partial sums into z. Thus, workers do some of the additions, and the client does the rest.

11
**More Notes d = 0; i = 0; for i = 1:4 b = i; d = i*2; A(i)= d; end**

parfor i = 1:4 b = i; d = i*2; A(i)= d; end A [2,4,6,8] d 8 i 4 b A [2,4,6,8] d i b / A(i): slice output variable d, b: temporary variable i: loop variable 变量可以从client传递到worker中用，但并不能改变此变量的值，循环结束此变量值不变；但Parfor内定义的临时变量循环结束后就消失了（如在parfor外不定义d = 0,结束后d变量不存在）。

12
**More Notes How to parallelize? C = 0; for i = 1:m for j = i:n**

C = C + i * j; end How to parallelize? C: reduction variable

13
**Parfor: Estimating an Integral**

14
**Parfor: Estimating an Integral**

function q = quad_fun( m, n, x1, x2, y1, y2 ) q = 0.0; u = (x2 - x1)/m; v = (y2 - y1)/n; for i = 1:m x = x1 + u * i; for j = 1:n y = y1 + v * j; fx = x^2 + y^2; q = q + u * v * fx; end

15
**Parfor: Estimating an Integral**

Computation complexity: O(m*n) Each iteration is independent of other iterations. We can replace “for” with “parfor”, for either loop index i or loop index j.

16
**Parfor: Estimating an Integral**

function q = quad_fun( m, n, x1, x2, y1, y2 ) q = 0.0; u = (x2 - x1)/m; v = (y2 - y1)/n; parfor i = 1:m x = x1 + u * i; for j = 1:n y = y1 + v * j; fx = x^2 + y^2; q = q + u * v * fx; end tic A = quad_fun(m,n,0,3,0,3); toc Why (1000,1000) takes less time than (100,100)? It doesn’t, really! How can "1+1" take longer than "1+0"? (It does, but it's probably not as bad as it looks!) Parallelism doesn't pay until your problem is big enough; Parallelism doesn't pay until you have a decent number of workers. (m, n) 1 + 0 1 + 1 1 + 2 1 + 3 1 + 4 (100, 100) 0.005 0.255 0.087 0.101 0.114 (1000, 1000) 0.035 0.066 0.046 0.045 0.053 (10000, 10000) 3.123 1.626 1.143 0.883 (100000, ) 85.185

17
**Parfor: Estimating an Integral**

function q = quad_fun( m, n, x1, x2, y1, y2 ) q = 0.0; u = (x2 - x1)/m; v = (y2 - y1)/n; for i = 1:m x = x1 + u * i; parfor j = 1:n y = y1 + v * j; fx = x^2 + y^2; q = q + u * v * fx; end tic A = quad_fun(m,n,0,3,0,3); toc (m, n) 1 + 0 1 + 1 1 + 2 1 + 3 1 + 4 (100, 100) 0.005 1.754 1.975 2.126 2.612 (1000, 1000) 0.035 13.146 15.286 18.661 22.313 (10000, 10000) 3.123 (100000, )

18
**SPMD SPMD: Single Program Multiple Data.**

SPMD command is like a very simplified version of MPI. The spmd statement lets you define a block of code to run simultaneously on multiple labs, each lab can have different, unique data for that code. Labs can communicate directly via messages, they meet at synchronization points. The client program can examine or modify data on any lab.

19
SPMD Statement

20
SPMD Statement

21
SPMD MATLAB sets up the requested number of labs, each with a copy of the program. Each lab “knows" it's a lab, and has access to two special functions: numlabs(), the number of labs; labindex(), a unique identifier between 1 and numlabs().

22
SPMD

23
**Distributed Arrays Distributed()**

You can create a distributed array in the MATLAB client, and its data is stored on the labs of the open MATLAB pool. A distributed array is distributed in one dimension, along the last nonsingleton dimension, and as evenly as possible along that dimension among the labs. You cannot control the details of distribution when creating a distributed array. Distributed array: 分布式矩阵 Distributed()函数可用于将client中定义的矩阵，分布到各个lab中。分布方式只能沿一个维度分开，默认竖直方向分开，一般尽量平均分配在各个lab中，和parfor一样，不能控制分布的具体细节。 W在逻辑上仍未一个完整的矩阵，但实际上是分块儿存储在不同的lab中的。

24
**Distributed Arrays Codistributed()**

You can create a codistributed array by executing on the labs themselves, either inside an spmd statement, in pmode, or inside a parallel job. When creating a codistributed array, you can control all aspects of distribution, including dimensions and partitions. Codistributed()函数把labs中存储的相同的矩阵变量分布在各个lab中，节约存储空间。

25
**Distributed Arrays Codistributed()**

You can create a codistributed array by executing on the labs themselves, either inside an spmd statement, in pmode, or inside a parallel job. When creating a codistributed array, you can control all aspects of distribution, including dimensions and partitions.

26
Example: Trapezoid Trapezoid: 梯形

27
Example: Trapezoid To simplify things, we assume interval is [0, 1] , and we'll let each lab define a and b to mean the ends of its subinterval. If we have 4 labs, then lab number 3 will be assigned [ ½, ¾].

28
Example: Trapezoid

29
**Parallel computing synchronously**

Pmode pmode lets you work interactively with a parallel job running simultaneously on several labs. Commands you type at the pmode prompt in the Parallel Command Window are executed on all labs at the same time. Each lab executes the commands in its own workspace on its own variables. Pmode每个lab都有一个窗口，你可以输入命令，看到在每个lab中的运行结果，进入lab的workspace. Spmd结束后其中的数据和信息都还存在，可以重新进入使用；pmode退出后，作业销毁，里面的数据就都没了，重新开启是一个新的开始。 The way the labs remain synchronized is that each lab becomes idle when it completes a command or statement, waiting until all the labs working on this job have completed the same statement. Only when all the labs are idle, do they then proceed together to the next pmode command. pmode spmd Parallel computing synchronously Each lab has a desktop No desktop for labs Can’t freely interleave serial and parallel work Can freely interleave serial

30
Pmode

31
**Pmode labindex() and numlabs() still work;**

Variables only have the same name, they are independent of each other.

32
**Pmode Aggregate the array segments into a coherent array.**

codist = codistributor1d(2, [ ], [3 8]) whole = codistributed.build(segment, codist) Codistributor1d: 1-D distribution scheme for codistributed array codistributed.build为构造函数

33
**Pmode Aggregate the array segments into a coherent array.**

whole = whole section = getLocalPart(whole) getLocalPart可以获取大矩阵分布在各个lab的小矩阵

34
**Pmode Aggregate the array segments into a coherent array**

combined = gather(whole) Gather()把分布在lab中的分布式阵列整合在一起输出在client中。

35
**Pmode How to change distribution? distobj = codistributor1d()**

I = eye(6, distobj) getLocalPart(I) distobj = codistributor1d(1); I = redistribute(I, distobj) getLocalPart(I)

36
**GPU Computing Capabilities Requirements**

Transferring data between the MATLAB workspace and the GPU Evaluating built-in functions on the GPU Running MATLAB code on the GPU Creating kernels from PTX files for execution on the GPU Choosing one of multiple GPU cards to use Requirements NVIDIA CUDA-enabled device with compute capability of 1.3 or greater NVIDIA CUDA device driver 3.1 or greater NVIDIA CUDA Toolkit 3.1 (recommended) for compiling PTX files

37
**GPU Computing Transferring data between workspace and GPU**

Creating GPU data N = 6; M = magic(N); G = gpuArray(M); M2 = gather(G);

38
**result = arrayfun(@myFunction, arg1, arg2);**

GPU Computing Executing code on the GPU You can transfer or create data on the GPU, and use the resulting GPUArray as input to enhanced built-in functions that support them. You can run your own MATLAB function file on a GPU. If any of arg1 and arg2 is a GPUArray, the function executes on the GPU and return a GPUArray If none of the input arguments is GPUArray, then arrayfun executes in CPU. Only element-wise operations are supported. result = arg1, arg2); Arrayfun: apply function to each element of array, not specified for GPU.

39
**Review What is the typical use cases of parallel Matlab?**

When to use parfor? What’s the difference between worker(parfor) and lab(spmd)? What’s the difference between spmd and pmode? How to build distributed array? How to use GPU for Matlab parallel computing?

Similar presentations

OK

GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.

GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google