ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Pages 408-415 Tianyi Wang, Gang Quan, Shangping.

ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Pages 408-415 Tianyi Wang, Gang Quan, Shangping Ren, Meikang Qiu 曾冠維 2013.09.05

 Introduction  Preliminary  Performance evaluation  Experimental results  Conclusions 2

 IC chip performance variation can cause significant discrepancies.  One major problem caused by manufacturing variations is the fabrication yield. 4

 Therefore, micro-architecture level and core level redundancies are employed to improve the fabrication yield.  According to“Exploiting micro-architectural redundancy for defect tolerance” Core-level redundancy will achieve better yield performance. 5

 Another problem caused by manufacturing variations is performance variations. 6

 How to reduce the total schedule legnth of task graph when realizing its nominal design?  Devoloping a performance metric based on the opportunity cost. 7

10 使用 Row Rippling Column Stealing algm(RRCS) 用 redundant core 取代 faulty core

 task graph G = {V,E}. V = {v1,v2,...,vk }  E = {e(i, j) = (vi,vj )| if task node vi communicates with task node vj } |vi|,represent the execution time of task node vi.  The Logical architecture denoted as, assume it consists of cores. = {,i= 0,...,r − 1; j = 0,...,c− 1 }. 11

 The nominal design of application G based on the logical architecture (denoted as N (G, ) ).  The Physical architecture is denoted as assume it consists of cores = {,i = 0,...,m− 1; j = 0,...,n− 1 }. 12

 Problem : Given an application G; a logical architecture ; the nominal design of G on, i.e. N (G, ) ; the physical architecture. 13

 Find the mapping of M M = { |i =0,...,r − 1; j =0,...,c -1; 0 ≤ x ≤ m − 1;0 ≤ y ≤ n − 1 }. such that the maximum latency to execute G based on N (G, ) is minimized. 14

 Introduction  NoC virtualization  Performance evaluation  Experimental results  Conclusions 15

 1. A simple workload/performance matching heuristic.  2. Opportunity cost based workload/performance mapping  3. Logical/physical topology mapping with communication awareness 16

17 Time complexity =

 While Algorithm A is fast and intuitive,it has serveral issues.  Problem1: Larger workloads don’t necessary locate on the critical path.  Problem2: Don’t take their location into consideration. 18

 The opportunity cost is the cost of any activity measured in terms of the value of the next best alternative forgone (that is not chosen).  It is the sacriﬁce related to the second best choice available to someone, or group, who has picked among several mutually exclusive choices. 19

 Mapping to  The task graph of this mapping is 51.67  Since the lantency of nominal design is 55,we define that the profit of the decision is 55- 51.67 = 3.33  For the rest of the alternatives to map,the best choice is to map it to,with latency of 53.18. The profit is 55-53.18 = 1.82 22

 Definition 1:, let its profit be denoted as let its opportunity cost denoted as Then the performance of the decision as 3.33-1.82 = 1.51 23

 For the example, we have =1.51, =0, =1,9, =0.76 According to Definition 1, mapping the loagical core with the largest workload assignment to the fastest core doesn’t reduce the critical path lantency and thus has the lowest performance. 24

 In the wrost case, the complexity of the while loop is O(kmn), since mxn different mappings need to be checked, where k is the number of task nodes.  The while loop will be executed for rxc times  Therefor, the overall complexity of algorithm2 is O(krcmn). 25

 Neither Algorithm 1 nor Algorithm 2 takes the communication cost into consideration.  When the communication cost becomes significant, especially for many-core platforms, the qualities of the mapping results by Algorithm A and Algorithm B can be severely compromised.  we propose an iterative algorithm (shown in Algorithm 3) to improve the performance of existing mapping results with taking the communication into consideration. 26

 When calculating the latency for the task graph, the communication cost can be incorporated into the calculation of performance of a mapping decision.  Algorithm 3 can iteratively improve the mapping solution, until the improvement threshold(ε) defined by user can be satisfied. 28

 Use TGFF to randomly generate task graphs(60 nodes)  The communication of each edge and execution time of each task are randomly generated.  We assume the P &C _OC algorithm stops after 200 iterations.  Experiments were running on a Window XP/SP3 platform powered by Intel(R) Core(TM)2 Duo CPU@ 2.93GHz with 3.21 GB of RAM 30

 SWPM to denote Algorithm 1,  P_Only_OC for Algorithm 2,  P&C_OC for Algorithm 3.  also compare with two previous work,i.e. RRCS algorithm, Hungarian algorithm. 31

32 A B C 1 2 3

 Performance vs. different communication/execution ratios.  Communication cost be generated within interval [a,b].  Execution time of task node be generated within interval [c,d].  C/E ratio = 33

34 2 3

 Introduce a framework to maximize the performance of the nominal design.  Heuristics based on the concept of opportunity cost.  The proposed approach can achieve up to 30% and with an average 15% of performance improvement. 38

ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Pages 408-415 Tianyi Wang, Gang Quan, Shangping.

Similar presentations

Presentation on theme: "ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Pages 408-415 Tianyi Wang, Gang Quan, Shangping."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Pages 408-415 Tianyi Wang, Gang Quan, Shangping.

Similar presentations

Presentation on theme: "ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Pages 408-415 Tianyi Wang, Gang Quan, Shangping."— Presentation transcript:

Similar presentations

About project

Feedback