Presentation is loading. Please wait.

Presentation is loading. Please wait.

Module R R RRR R RRRRR RR R R R R Technion – Israel Institute of Technology The Era of Many-Module SoC: Revisiting the NoC Mapping Problem Isask’har (Zigi)

Similar presentations


Presentation on theme: "Module R R RRR R RRRRR RR R R R R Technion – Israel Institute of Technology The Era of Many-Module SoC: Revisiting the NoC Mapping Problem Isask’har (Zigi)"— Presentation transcript:

1 Module R R RRR R RRRRR RR R R R R Technion – Israel Institute of Technology The Era of Many-Module SoC: Revisiting the NoC Mapping Problem Isask’har (Zigi) Walter, Israel Cidon, Avinoam Kolodny, Daniel Sigalov December, 2009

2 SoC Revolution PE1 R PE2PE3 RR PE4 R PE5PE6 RR PE1PE2PE3 PE4PE5PE6 2 Bus-based system NoC-based system

3 SoC Evolution RR R R R R R R R 3

4 Processor Evolution CPU Cache Single Core CPU1 Cache Dual Core CPU2 Cache CPU1 Cache Quad Core CPU3 Cache CPU2 Cache CPU4 Cache 4

5 How would such chips be like? Most likely  Power still important  Highly parallel  IP reuse  Ease of design and verification The Era of Many-Module SoC High Certainty Totally unknown 5 Large number of modules NoC Interconnect Applications

6 Special purpose cores replace general purpose processors  Power considerations Future SoCs - Observation#1 General Purpose CPU Pre. Proc. DSP CPU Task 1 Task 2 MEM Task 3 Task 4 MEM Memory Task 1 Task 2 MEM Task 3 Task 4 MEM Memory Task 5 Task 5 GPU 6 Processing pipes are getting longer

7 Future SoCs - Observation#2 ? Large diversity All modules are unique Highly regular Classes of Replicated cores standard modules (DSP, HW accelerators, Cache banks, etc.) 7

8 Increased use of specialized cores  Pipes are getting longer Replication of processing elements How is the design flow affected?  This work – mapping of the NoC The Era of Many-Module SoC Observation#1 Observation#2 8

9 The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation Outline 9

10 Given  Traffic pattern(s) a set (or sets) of pair-wise bandwidth requirements and timing constraints  Routing  Topology Goal  Find efficient mapping of cores to tiles NoC Mapping PE4 PE1PE2 PE5 PE3 PE6 PE7PE8PE9PE4 PE1 PE2PE5 PE3 PE6PE7 PE8 PE9 PE4PE1 PE2 PE5 PE3PE6 PE7PE8 PE9 PE4 PE1 PE2 PE5 PE3 PE6 PE7 PE8 PE9 PE4 PE1PE2 PE5 PE3 PE6 PE7 PE8PE9 10

11 An important design step  Mapping affects power and performance! A difficult problem!  Often heuristic algorithms are used Common optimization goals  Minimize (dynamic) power  Minimize power + maximize performance  Minimize power subject to performance constraints Mapping Optimization 11

12 Typical modeling  Power and latency proportional to distance  Cost function: Modeling 12

13 Calculating Mapping Cost PE1PE2 PE4PE5 PE3 PE6 100 30 Mapping π 1 Mapping π 2 13

14 Motivation - Example #1 Optimal mapping ( π 1 ): PE1PE2PE3PE4 MEM 1 MEM 2 PE1MEM2 PE2PE3 PE4 MEM1 111 1 1 1 1 14

15 Optimal mapping ( π 2 ): Let the mapping algorithm assign the flows! Motivation - Example #1 (cont.) PE1PE2PE3PE4 2*MEM PE1 MEM1 PE2 PE3PE4 MEM2PE1MEM1 PE2PE3 PE4 MEM2 Cost( π 1 )=9Cost( π 2 )=7 15

16 Motivation - Example #1 (cont.) PE1PE2PE3PE4 MEM 1 MEM 2 111 1 1 1 1 PE1 MEM1 PE2 PE3PE4 MEM2 Cost( π 2 )=7 16 The mapping algorithm should be aware of replicated modules!

17 Pair-wise point-to-point requirements For example, in a 4-module system: Classic Performance Constraints PE1 2 PE2 1 PE3 11 PE4 PE3PE2PE1 17

18 PE1PE2PE3 PE4 Motivation - Example #2 Timing Requirement PEsStream ID 4PE1  PE2  PE3  PE4Stream 1 1PE2  PE4Stream 2 Stream 1 Stream 2 18

19 Example #2 – Pair-wise req. No feasible mapping! PE1PE2PE3 PE4 PE2. PE4. PE1 PE3PE2. PE4. PE1PE3 PE2. PE4 PE1PE3 2PE2 1PE3. 11PE4 PE3PE2PE1 19 Req=2Req=1

20 PE1PE2PE3 PE4 Application-Level Requirements RequirementPEsStream ID 4PE1  PE2  PE3  PE4Stream 1 1PE2  PE4Stream 2 PE2 PE4 PE1 PE3 Stream 1 Stream 2 A feasible mapping does exist! 20 Req=1Req=2 Req=1 It’s better to work with the application level requirements

21 Find efficient mappings by extending the formulation of the mapping problem  Adding degrees of freedom Degree of freedom #1  Leverage existence of replicated modules Degree of freedom #2  Replace p2p constraints with end-to-end, application-level requirements This Work 21

22 Modifying the Formulation (1) 22 Time Req. BWFlow 3100PE 1  DSP 3 12200PE 2  DSP 4 15100PE 2  SRAM 1 5100PE 3  SRAM 2 ……… Time Req. BWFlow 3100PE 1  12200PE 2  15100PE 2  5100PE 3  ……… Leverage existence of replicated modules  Allow the mapping algorithm to allocate flows to the best replicated module

23 ∞ ∞2 443 ∞313 ∞4773 ∞3∞24∞ 4∞723∞5 ∞∞2∞156∞ 773∞∞53∞1 ∞3∞3212∞31 Modifying the Formulation (2) 23 E2E Req. Stream’s PEsStream ID 23PE 1  PE 3  PE 9  PE 4  PE 10 1 12PE 5  PE 2  PE 3  PE 8  PE 7  PE 6  PE 10 2 15PE 5  PE 3  PE 9 3 20PE 7  PE 8  PE 2  PE 3 4 2PE 1  PE 2 5 ……… In this paper, for synthetic task graphs  Did so for a real application too P2P timing req. E2E timing req. Replace p2p constraints with end-to-end, application-level requirements

24 The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation Outline 24

25 Modern optimization heuristic  Good at combinatorial optimization problems Akin to evolutionary algorithms  Generation of new solutions is based on sampling and estimation Inherently a global search method  Reduced risk of getting trapped in a local minimum Cross Entropy Optimization 25

26 Given an initial parameter vector v=v 0, sample a random population of K solutions x 1,x 2,…,x k from the distribution given by f(x;v). Evaluate the costs S(xi),i=1,…,K. Using the ρK (0<ρ<1) elite (lowest cost) samples, obtain a new density function f(x;v) by calculating a new vector v via Maximum Likelihood (ML) estimation. Repeat steps 1-3 with the new vector v unless maximum number of iterations is reached or no improvement is obtained for a predefined number of iterations. Cross Entropy Optimization 1. Generate 10 random mappings: π 1, π 2, …, π 10 2. Find 3 lowest cost mappings: π 2, π 5, π 7 3. Examine those 3 best mappings: A. For each tile, calculate the probability core PE i is mapped to that tile B. Update probabilities accordingly For example: 26

27 Prob(TileA  PE 1 )= Prob(TileA  PE 2 )= Prob(TileA  PE 3 )=Prob(TileA  PE 4 )=0.25 Prob(TileB  PE 1 )= Prob(TileB  PE 2 )= Prob(TileB  PE 3 )=Prob(TileB  PE 4 )=0.25 Prob(TileC  PE 1 )= Prob(TileC  PE 2 )= Prob(TileC  PE 3 )=Prob(TileC  PE 4 )=0.25 Prob(TileD  PE 1 )= Prob(TileD  PE 2 )= Prob(TileD  PE 3 )=Prob(TileD  PE 4 )=0.25 CE Example PE3 PE1PE2 PE4 Tile C Tile A Tile D Tile B π1π1 PE3 PE1 PE2 PE4 π2π2 PE3 PE1PE2 PE4 π3π3 PE3 PE1PE2 PE4 π4π4 PE3 PE1PE2 PE4 π5π5 PE3 PE1 PE2 PE4 π6π6 PE3PE1 PE2PE4 π7π7 PE3 PE1 PE2 PE4 π8π8 PE3 PE1 PE2 PE4 π9π9 PE3 PE1PE2 PE4 π 10 27

28 Prob(TileA  PE 1 )=1 Updating Probabilities PE1PE2 PE4PE3 PE1 PE2 PE4PE1PE2 PE4PE3 Prob(TileB  PE2)=2/3 Prob(TileB  PE4)=1/3 Prob(TileD  PE2)=1/3 Prob(TileD  PE3)=1/3 Prob(TileD  PE4)=1/3 Prob(TileC  PE3)=2/3 Prob(TileC  PE4)=1/3 π2π2 π5π5 π3π3 Following iteration uses these updates probabilities Gradually, probabilities converge to 0/1 Tile C Tile A Tile D Tile B 28

29 The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation Outline 29

30 Scenario  6x6 mesh NoC  Synthetic, randomized SoC Task graphs (and task-to-core mapping) Varying number of replicated modules Varying timing constraints (Real application in DATE10 paper) Compare with best cost of classic mapping  Averaging multiple runs Evaluation 30

31 “Class”: a group of identical PEs  Total number of replicated cores= {Number of classes}*{class size} Accounting for Replication 31 Cost Reduction [%] 1020 Total number of Replicated Modules [%] 304050

32 SoCs with a pipeline data path and background P2P traffic  Varying pipeline slack  Different amounts of background constraints Application-Level Requirements 32 Cost Reduction [%] Pipeline Slack

33 We are going into the era of “Many module SoC” Extend the mapping to account for  Classes of replicated modules  Application-level requirements Meaningful power savings But mapping is an example  Routing? Task assignment? Link design? Topology selection? Conclusions and Future Work 33

34 Thank you! Questions? zigi@tx.technion.ac.il The Era of Many-Module SoC QNoC Research Group Group Research QNoC 34


Download ppt "Module R R RRR R RRRRR RR R R R R Technion – Israel Institute of Technology The Era of Many-Module SoC: Revisiting the NoC Mapping Problem Isask’har (Zigi)"

Similar presentations


Ads by Google