Presentation is loading. Please wait.

Presentation is loading. Please wait.

Given UPC algorithm – Cyclic Distribution Simple algorithm does cyclic distribution This means that data is not local unless item weight is a multiple.

Similar presentations


Presentation on theme: "Given UPC algorithm – Cyclic Distribution Simple algorithm does cyclic distribution This means that data is not local unless item weight is a multiple."— Presentation transcript:

1 Given UPC algorithm – Cyclic Distribution Simple algorithm does cyclic distribution This means that data is not local unless item weight is a multiple of THREADS Worse if CAPACITY+1 is not divisible by THREADS (checkerboard pattern) NOTE: For this table T capacity is horizontal and Items are vertical 12341234123412341234

2 Possible algorithm – BlockCyclic Distribution A simple fix: block-cyclic this algorithm gives the benefit that each previous item of same capacity has same affinity more local only computations can be performed if the items are sorted by weight beforehand so that processors generally only go locally initially for data 1234

3 Possible algorithm – BlockCyclic Distribution Looking at communication for a processor Algorithm communicates a lot of data with every item depending on the weight of the item Data is communicated with two processors in an unknown communication pattern 1234

4 Possible algorithm – BlockCyclic Distribution More detailed look at communication Since communication is going to be the most important part lets focus our attention at a subset of processor’s 3 data and look at what it needs It can be seen that almost all the data required is horizontal for the processor with very little required vertically 1234

5 New algorithm – Blocked Distribution More detailed look at communication Change layout to fully blocked which makes most data needed local Only communication required now for the subset is the parts coming from processor 2 1 2 3 4

6 New algorithm – Blocked Distribution BIG PROBLEM: Algorithm Serial Processor 3 requires data from processor 2 before continuing computation if it were to do entire data IDEA: run subsets of data while sticking with the blocked distribution 1 2 3 4

7 New algorithm – Blocked Distribution BIG PROBLEM: Algorithm Serial Processor 3 requires data from processor 2 before continuing computation if it were to do entire data IDEA: run subsets of data while sticking with the blocked distribution 1 2 3 4

8 New algorithm – Blocked Distribution BIG PROBLEM: Algorithm Serial Processor 3 requires data from processor 2 before continuing computation if it were to do entire data IDEA: run subsets of data while sticking with the blocked distribution 1 2 3 4

9 New algorithm – Blocked Distribution BIG PROBLEM: Algorithm Serial Processor 3 requires data from processor 2 before continuing computation if it were to do entire data IDEA: run subsets of data while sticking with the blocked distribution 1 2 3 4

10 New algorithm – Blocked Distribution BIG PROBLEM: Algorithm Serial Processor 3 requires data from processor 2 before continuing computation if it were to do entire data IDEA: run subsets of data while sticking with the blocked distribution 1 2 3 4

11 Pipeline algorithm – Blocked Distribution New pipelined algorithm Processors run in parallel diagonally with processor 1 starting the work and processor 4 finishing Full Parallelism achieved only when pipeline full 1 2 4 3 1

12 Pipeline algorithm – Blocked Distribution New pipelined algorithm Processors run in parallel diagonally with processor 1 starting the work and processor 4 finishing Full Parallelism achieved only when pipeline full 1 2 4 3 1 2

13 Pipeline algorithm – Blocked Distribution New pipelined algorithm Processors run in parallel diagonally with processor 1 starting the work and processor 4 finishing Full Parallelism achieved only when pipeline full 1 2 4 3 1 2 3

14 Pipeline algorithm – Blocked Distribution New pipelined algorithm Processors run in parallel diagonally with processor 1 starting the work and processor 4 finishing Full Parallelism achieved only when pipeline full 1 2 4 3 1 2 3 4

15 Pipeline algorithm – Blocked Distribution New pipelined algorithm Processors run in parallel diagonally with processor 1 starting the work and processor 4 finishing Full Parallelism achieved only when pipeline full 1 2 4 3 2 3 4

16 Pipeline algorithm – Blocked Distribution New pipelined algorithm Processors run in parallel diagonally with processor 1 starting the work and processor 4 finishing Full Parallelism achieved only when pipeline full 1 2 4 33 4

17 Pipeline algorithm – Blocked Distribution New pipelined algorithm Processors run in parallel diagonally with processor 1 starting the work and processor 4 finishing Full Parallelism achieved only when pipeline full 1 2 4 3 4

18 Pipeline algorithm – Other considerations Different layouts for different problem sizes If the problem isn’t very square could consider changing the layout so that pipeline gets filled earlier The optimal choices are a case of tuning Other optimizations If items are sorted in decreasing order this will help fill in the pipeline earlier (top left corner filled with zero's) Sorting is O(n log n) Most of the table is really local so can avoid keeping entire T table shared (just keep last row between processors shared and use upc_get/upc_put) 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 12 3 4 1 0’s


Download ppt "Given UPC algorithm – Cyclic Distribution Simple algorithm does cyclic distribution This means that data is not local unless item weight is a multiple."

Similar presentations


Ads by Google