1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5.

1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5

2 Partitioning 划分就是简单地将问题分成几部分，然后计算每部分，最后组合各部分得结果。

3 Divide and Conquer 分治 Characterized by dividing problem into sub-problems of same form as larger problem. Further divisions into still smaller sub-problems, usually done by recursion. 通过递归，将问题分成与该问题形式一样的子问题，然后继续将子问题继续分成更小规模的子问题。递归的分治可以并行，因为每个进程可以对子问题进行处理。

4 Partitioning/Divide and Conquer Examples Many possibilities. Operations on sequences of number such as simply adding them together Several sorting algorithms can often be partitioned or constructed in a recursive fashion Numerical integration N-body problem

5 Partitioning a sequence of numbers into parts and adding the parts

6 方法 Send+recv (for loop send, for loop recv) Bcast+recv (master: bcast all to slaves + for loop recv, slaves ： receive all from master+ send result to master) Scatter+reduce

7 Tree construction 二叉树

8 Dividing a list into parts send recv

9 Partial summation

10 进程 P 0 /*division phase*/ Divide(s1,s1,s2);/*divide s1 into two, s1 and s2*/ Send(s2,P4); Divide(s1,s1,s2); Send(s2,P2); Divide(s1,s1,s2); Send(s2,P1); /*combining phase*/ part_sum=*s1; Recv(&part_sum1,P1); part_sum=part_sum+part_sum1; Recv(&part_sum1,P2); part_sum=part_sum+part_sum1; Recv(&part_sum1,P4); part_sum=part_sum+part_sum1;

11 进程 P 4 /*division phase*/ Recv(s1,p0); Divide(s1,s1,s2); Send(s2,P6); Divide(s1,s1,s2); Send(s2,P5); /*combining phase*/ part_sum=*s1; Recv(&part_sum1,P5); part_sum=part_sum+part_sum1; Recv(&part_sum1,P6); part_sum=part_sum+part_sum1; Send(&part_sum1,P0);

12 M 路分治

13 Many Sorting algorithms can be parallelized by partitioning and by divide and conquer. Example Bucket sort （桶排序）

14 Bucket sort （桶排序） One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Sequential sorting time complexity: O(nlog(n/m). Works well if the original numbers uniformly distributed across a known interval, say 0 to a - 1.

15 Parallel version of bucket sort Simple approach Assign one processor for each bucket.

16 big_array = malloc(n*sizeof(long int)); Make_numbers(big_array, n, n/p, p); n_bar = n/p; local_array = malloc(n_bar*sizeof(long int)); MPI_Scatter(big_array, n_bar, MPI_LONG, local_array, n_bar, MPI_LONG, 0, MPI_COMM_WORLD); Sequential_sort(local_array, n_bar); MPI_Gather(local_array, n_bar, MPI_LONG, big_array, n_bar, MPI_LONG, 0, MPI_COMM_WORLD);

19 Further Parallelization Partition sequence into m regions. Each processor assigned one region. (Hence number of processors, p, equals m.) Each processor maintains one “big” bucket for its region. Each processor maintains m “small” buckets, one for each region. Each processor separates numbers in its region into its own small buckets. All small buckets emptied into p big buckets Each big bucket sorted by its processor

20 Another parallel version of bucket sort

21 An MPI function that can be used to advantage here. Sends one data element from each process to every other process. Corresponds to multiple scatter operations, but implemented together. Should be called all-to-all scatter? all-to-all broadcast

22 From: http://www.mhpcc.edu/training/workshop/parallel_intro/MAIN.html

23 4.13 Applying “all-to-all” broadcast to emptying small buckets into big buckets Suppose 4 regions and 4 processors Small bucket Big bucket

24 “all-to-all” routine actually transfers rows of an array to columns: Transposes a matrix.

25 Numerical Integration Computing the “area under the curve.” Parallelized by divided area into smaller areas (partitioning). Can also apply divide and conquer -- repeatedly divide the areas recursively.

26 Numerical integration using rectangles Each region calculated using an approximation given by rectangles: Aligning the rectangles:

27 Numerical integration using trapezoidal method May not be better!

28 Applying Divide and Conquer

29 Adaptive Quadrature Solution adapts to shape of curve. Use three areas, A, B, and C. Computation terminated when largest of A and B sufficiently close to sum of remain two areas.

30 Adaptive quadrature with false termination. Some care might be needed in choosing when to terminate. Might cause us to terminate early, as two large regions are the same (i.e., C = 0).

31 Barnes-Hut Algorithm Start with whole space in which one cube contains the bodies (or particles). First, this cube is divided into eight subcubes. If a subcube contains no particles, subcube deleted from further consideration. If a subcube contains one body, subcube retained. If a subcube contains more than one body, it is recursively divided until every subcube contains one body.

32 Creates an octtree - a tree with up to eight edges from each node. The leaves represent cells each containing one body. After the tree has been constructed, the total mass and center of mass of the subcube is stored at each node.

33 Constructing tree requires a time of O(nlogn), and so does computing all the forces, so that overall time complexity of method is O(nlogn).

34 Recursive division of 2-dimensional space

35 (For 2-dimensional area) First, a vertical line found that divides area into two areas each with equal number of bodies. For each area, a horizontal line found that divides it into two areas each with equal number of bodies. Repeated as required. Orthogonal Recursive Bisection

1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5.

Similar presentations

Presentation on theme: "1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5.

Similar presentations

Presentation on theme: "1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5."— Presentation transcript:

Similar presentations

About project

Feedback