Prepared 8/19/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Mathematical Preliminaries
1
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Dynamic Programming Introduction Prof. Muhammad Saeed.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
Polygon Scan Conversion – 11b
Break Time Remaining 10:00.
The basics for simulations
Discrete Math Recurrence Relations 1.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
Chapter 17 Linked Lists.
MAT 205 F08 Chapter 12 Complex Numbers.
LIAL HORNSBY SCHNEIDER
Bellwork Do the following problem on a ½ sheet of paper and turn in.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Motion in Two and Three Dimensions
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
When you see… Find the zeros You think….
Subtraction: Adding UP
Factors Terminology: 3  4 =12
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Essential Cell Biology
Chapter 8 Estimation Understandable Statistics Ninth Edition
Exponents and Radicals
Clock will move after 1 minute
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Topic 16 Sorting Using ADTs to Implement Sorting Algorithms.
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Copyright Tim Morris/St Stephen's School
9. Two Functions of Two Random Variables
User Defined Functions Lesson 1 CS1313 Fall User Defined Functions 1 Outline 1.User Defined Functions 1 Outline 2.Standard Library Not Enough #1.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Copyright © Cengage Learning. All rights reserved.
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 11.5.
1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5.
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,
Divide-and-Conquer Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, June 25, 2012 slides4.ppt. 4.1.
Partitioning and Divide-and-Conquer Strategies
and Divide-and-Conquer Strategies
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Presentation transcript:

Prepared 8/19/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Partitioning: simply divides the problem into parts Divide-and-Conquer: Characterized by dividing the problem into sub-problems of same form as larger problem. Further divisions into still smaller sub-problems, usually done by recursion. Recursive divide-and-conquer amenable to parallelization because separate processes can be used for divided parts. Also usually data is naturally localized. Partitioning and Divide-and-Conquer Strategies – Slide 2

Data partitioning/domain decomposition Independent tasks apply same operation to different elements of a data set Okay to perform operations concurrently Functional decomposition Independent tasks apply different operations to different data elements Statements on each line can be performed concurrently Partitioning and Divide-and-Conquer Strategies – Slide 3 for (i=0; i<99; i++) a[i]=b[i]+c[i]; a = 2; b = 3; m = (a+b)/2; s = (a*a+b*b)/2; v = s*m*m;

Data mining: looking for meaningful patterns in large data sets Data clustering: organizing a data set into clusters of “similar” items Data clustering can speed retrieval of related items Partitioning and Divide-and-Conquer Strategies – Slide 4

1. Compute document vectors 2. Choose initial cluster centers 3. Repeat a. Compute performance function b. Adjust centers until function value converges or the maximum number of iterations have elapsed 1. Output cluster centers Partitioning and Divide-and-Conquer Strategies – Slide 5

Operations being applied to a data set Examples Generating document vectors Finding closest center to each vector Picking initial values of cluster centers Partitioning and Divide-and-Conquer Strategies – Slide 6

Partitioning and Divide-and-Conquer Strategies – Slide 7 Build document vectors Compute function value Choose cluster centers Adjust cluster centersOutput cluster centers Do in parallel

Many possibilities: Operations on sequences of numbers such as simply adding them together. Several sorting algorithms can often be partitioned or constructed in a recursive fashion. Numerical integration N-body problem Partitioning and Divide-and-Conquer Strategies – Slide 8

Partition sequence into parts and add them. Partitioning and Divide-and-Conquer Strategies – Slide 9

Partitioning and Divide-and-Conquer Strategies – Slide 10 __global__ void add (int *numbers, int *part_sum) { int partialSum = 0, tid = threadIdx.x, s = n / blockDim.x; for (int i = tid * s; i < (tid + 1) * s; i++) partialSum += numbers[i]; part_sum[tid] = partialSum; __syncthreads(); } int main(void) { int numbers[n], part_sum[m], *dev_numbers, *dev_part_sum; cudaMalloc((void**)&dev_numbers, n * sizeof(int)); cudaMalloc((void**)&dev_part_sum, m * sizeof(int)); cudaMemcpy(dev_numbers, numbers, n * sizeof(int), cudaMemcpyHostToDevice); add >>(dev_numbers, dev_part_sum); // 1 block, m threads cudaMemcpy(part_sum, dev_part_sum, m * sizeof(int), cudaMemcpyDeviceToHost); int sum = 0; for (int i = 0; i < m; i++) sum += part_sum[i]; cudaFree(dev_numbers); cudaFree(dev_part_sum); free(part_sum); }

One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Partitioning and Divide-and-Conquer Strategies – Slide 11

Sequential sorting time complexity: O(n log n / m ) for n numbers divided into m parts. Works well if the original numbers uniformly distributed across a known interval, say 0 to a-1. Simple approach to parallelization: assign one processor for each bucket. Partitioning and Divide-and-Conquer Strategies – Slide 12

Finding positions and movements of bodies in space subject to gravitational forces from other bodies using Newtonian laws of physics. Partitioning and Divide-and-Conquer Strategies – Slide 13

Gravitational force F between two bodies of masses m a and m b is G is the gravitational constant and r the distance between the bodies. Partitioning and Divide-and-Conquer Strategies – Slide 14

Subject to forces, body accelerates according to Newton’s second law: F = ma where m is mass of the body, F is force it experiences and a is the resultant acceleration. Let the time interval be  t. Let v t be the velocity at time t. For a body of mass m the force is Partitioning and Divide-and-Conquer Strategies – Slide 15

New velocity then is Over time interval  t position changes by where x t is its position at time t. Once bodies move to new positions, forces change and computation has to be repeated. Partitioning and Divide-and-Conquer Strategies – Slide 16

Overall gravitational N-body computation can be described as Partitioning and Divide-and-Conquer Strategies – Slide 17 for (t = 0; t < tmax; t++) {/*  time periods */ for (i = 0; i < N; i++) {/* for each body */ F = Force_routine(i);/* force on body i */ v[i] new = v[i] + F * dt / m;/* new velocity */ x[i] new = x[i] + v[i] new * dt;/* new position */ } for (i = 0; i < N; i++) {/* for each body */ x[i] = x[i] new ;/* update velocity */ v[i] = v[i] new ;/* and position */ }

The sequential algorithm is an O(N²) algorithm (for one iteration) as each of the N bodies is influenced by each of the other N – 1 bodies. Not feasible to use this direct algorithm for most interesting N-body problems where N is very large. Time complexity can be reduced using observation that a cluster of distant bodies can be approximated as a single distant body of the total mass of the cluster sited at the center of mass of the cluster. Partitioning and Divide-and-Conquer Strategies – Slide 18

Start with whole space in which one cube contains the bodies (or particles). First this cube is divided into eight subcubes. If a subcube contains no particles, the subcube is deleted from further consideration. If a subcube contains one body, subcube is retained. If a subcube contains more than one body, it is recursively divided until every subcube contains one body. Partitioning and Divide-and-Conquer Strategies – Slide 19

Creates an octtree – a tree with up to eight edges from each node. The leaves represent cells each containing one body. After the tree has been constructed, the total mass and center of mass of the subcube is stored at each node. Force on each body obtained by traversing tree starting at root, stopping at a node when the clustering approximation can be used, e.g. when r  d/  where  is a constant typically 1.0 or less. Constructing tree requires a time of O(n log n), and so does computing all the forces, so that the overall time complexity of the method is O(n log n). Partitioning and Divide-and-Conquer Strategies – Slide 20

Partitioning and Divide-and-Conquer Strategies – Slide 21

(For 2-dimensional area) First a vertical line is found that divides area into two areas each with an equal number of bodies. For each area a horizontal line is found that divides it into two areas each with an equal number of bodies. Repeated as required. Partitioning and Divide-and-Conquer Strategies – Slide 22

Assume one task per particle Task has particle’s position, velocity vector Iteration Get positions of all other particles Compute new position, velocity Partitioning and Divide-and-Conquer Strategies – Slide 23

Suppose we have a function ƒ which is continuous on [ ,b] and differentiable on ( ,b). We wish to approximate  ƒ(x)dx on [ ,b]. This is a definite integral and so is the area under the curve of the function. We simply estimate this area by simpler geometric objects. The process is called numerical integration or numerical quadrature. Partitioning and Divide-and-Conquer Strategies – Slide 24

Each region calculated using an approximation given by rectangles; aligning the rectangles: Partitioning and Divide-and-Conquer Strategies – Slide 25

The area of the rectangles is the length of the base times the height. As we can see by the figure base = , while the height is the value of the function at the midpoint of p and q, i.e. height = ƒ(½(p+q)). Since there are multiple rectangles, designate the endpoints by x 0 = , x 1 = p, x 2 = q, x 3, …, x n = b; Thus Partitioning and Divide-and-Conquer Strategies – Slide 26

Can show that Divide the interval [0,1] into the N subintervals [ i-1 / N, i / N ] for i=1,2,3,…,N. Then Partitioning and Divide-and-Conquer Strategies – Slide 27

Partitioning and Divide-and-Conquer Strategies – Slide 28 #include __global__ void term (int *part_sum) { int n = blockDim.x; double int_size = 1.0/(double)n; int tid = threadIdx.x; double x = int_size * ((double)tid – 0.5); double partialSum = 4.0 / (1.0 + x * x); double temp_pi = int_size * part_sum; part_sum[tid] = temp_pi; __syncthreads(); }

Partitioning and Divide-and-Conquer Strategies – Slide 29 int main(void) { double actual_pi = ; int n; double calc_pi = 0.0, *part_sum, *dev_part_sum; printf(“The pi calculator.\n”); printf(“No. intervals ”); scanf(“%d”, &n); if (n == 0) break; malloc((void**)&part_sum, n * sizeof(double)); cudaMalloc((void**)&dev_part_sum, n * sizeof(double)); term >>(dev_part_sum); // 1 block, n threads cudaMemcpy(part_sum, dev_part_sum, n * sizeof(double), cudaMemcpyDeviceToHost); for (int i = 0; i < n; i++) calc_pi += part_sum[i]; cudaFree(dev_part_sum); free(part_sum); printf(“pi = %f\n”, calc_pi); printf(“Error = %f\n”, fabs(calc_pi – actual_pi)); }

May not be better! Partitioning and Divide-and-Conquer Strategies – Slide 30

The area of the trapezoid is the area of the triangle on top plus the area of the rectangle below. For the rectangle, we can see by the figure that base = , while the height = ƒ(p); thus area =  ·ƒ(p). For the triangle, base =  while the height = ƒ(q) – ƒ(p), so area = ½·  (ƒ(q) – ƒ(p)). Partitioning and Divide-and-Conquer Strategies – Slide 31 ƒ(p) ƒ(q)  = q-p

Thus the total area of the trapezoid is ½·  (ƒ(p)+ƒ(q)). As before there are multiple trapezoids so designate the endpoints by x 0 = , x 1 = p, x 2 = q, x 3, …, x n = b. Thus Partitioning and Divide-and-Conquer Strategies – Slide 32

Returning to our previous example we see that Partitioning and Divide-and-Conquer Strategies – Slide 33

Comparing our methods Partitioning and Divide-and-Conquer Strategies – Slide 34 NRectangle Estimate Trapezoid Estimate ,

Solution adapts to shape of curve. Use three areas A, B and C. Computation terminated when largest of A and B sufficiently close to sum of remaining two areas. Partitioning and Divide-and-Conquer Strategies – Slide 35

Some care might be needed in choosing when to terminate. Might cause us to terminate early, as two large regions are the same (i.e. C=0). Partitioning and Divide-and-Conquer Strategies – Slide 36

For this example we consider an adaptive trapezoid method. Let T( ,b) be the trapezoid calculation on [ ,b], i.e. T( ,b) = ½(b-  )(ƒ(  )+ƒ(b)). Specify a level of tolerance  > 0. Our algorithm is then: 1. Compute T( ,b) and T( ,m)+T(m,b) where m is the midpoint of [ ,b], i.e. m = ½(  +b). 2. If | T( ,b) – [T( ,m)+T(m,b)] | <  then use T( ,m)+T(m,b) as our estimate and stop. 3. Otherwise separately approximate T( ,m) and T(m,b) inductively with a tolerance of ½ . Partitioning and Divide-and-Conquer Strategies – Slide 37

Clearly  x dx over [0,1] is 2/3. Try to approximate this with a tolerance of In this case T( ,b) = ½(b –  )(  +  b). 1. T(0,1) = 0.5, tolerance is T(0,½) + T(½,1) = = |0.5 – | = ; try again. 2. Estimate T(½,1) with tolerance T(½,¾) + T(¾,1) = = | – | = ; try again. Partitioning and Divide-and-Conquer Strategies – Slide 38

3. Estimate T(½, ¾) and T(¾,1) each with tolerance a. T(½, ¾) = T(½, ⁵⁄₈) + T(⁵⁄₈, ¾) = = | – | = ; done. b. T(¾, 1) = T(¾, ⁷⁄₈) + T(⁷⁄₈, 1) = = | – | = ; done. Our revised estimate for T(½,1) is the sum of the revised estimates for T(½, ¾) and T(¾, 1). Thus T(½,1) = = Partitioning and Divide-and-Conquer Strategies – Slide 39

Now for T(0,½). Partitioning and Divide-and-Conquer Strategies – Slide 40 ab  mT(a,b)T(a,b)T(a,m) + T(m,b) |diff| 1/41/ * 1/81/ * 1/161/ * 1/321/ * 1/641/ * Subtotal

Still more for T(0,½). Partitioning and Divide-and-Conquer Strategies – Slide 41 ab  ≈ m ≈T(a,b)T(a,b)T(a,m) + T(m,b) |diff| 1/1281/643.91E E-06* 1/2561/ E E-06* 1/5121/ E E-07* 01/ E E-06* Subtotal Total

So our final estimate for T(0,½) is Our previous final estimate for T(½,1) was Thus the final estimate for T(0,1) is the sum of those for T(0,½) and T(½,1) which is The actual answer was 2/3 for an error of , well below our tolerance of Partitioning and Divide-and-Conquer Strategies – Slide 42

Two strategies Partitioning: simply divides the problem into parts Divide-and-Conquer: divide the problem into sub- problems of same form as larger problem Examples Operations on sequences of numbers such as simply adding them together. Several sorting algorithms can often be partitioned or constructed in a recursive fashion. Numerical integration N-body problem Partitioning and Divide-and-Conquer Strategies – Slide 43

Based on original material from The University of Akron: Tim O’Neil The University of North Carolina at Charlotte Barry Wilkinson, Michael Allen Oregon State University: Michael Quinn Revision history: last updated 8/19/2011. Partitioning and Divide-and-Conquer Strategies – Slide 44