Module R R RRR R RRRRR RR R R R R Technion – Israel Institute of Technology The Era of Many-Module SoC: Revisiting the NoC Mapping Problem Isask’har (Zigi)

Slides:



Advertisements
Similar presentations
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.
Advertisements

Population-based metaheuristics Nature-inspired Initialize a population A new population of solutions is generated Integrate the new population into the.
ECE 667 Synthesis and Verification of Digital Circuits
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
1 Enhanced EDF Scheduling Algorithms for Orchestrating Network-wide Active Measurements Prasad Calyam, Chang-Gun Lee Phani Kumar Arava, Dima Krymskiy OARnet,
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Resource Management of Highly Configurable Tasks April 26, 2004 Jeffery P. HansenSourav Ghosh Raj RajkumarJohn P. Lehoczky Carnegie Mellon University.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Technion – Israel Institute of Technology Qualcomm Corp. Research and Development, San Diego, California Leveraging Application-Level Requirements in the.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
Localized Techniques for Power Minimization and Information Gathering in Sensor Networks EE249 Final Presentation David Tong Nguyen Abhijit Davare Mentor:
A New Evolutionary Algorithm for Multi-objective Optimization Problems Multi-objective Optimization Problems (MOP) –Definition –NP hard By Zhi Wei.
1 Evgeny Bolotin – Efficient Routing, DATE 2007 Routing Table Minimization for Irregular Mesh NoCs Evgeny Bolotin, Israel Cidon, Ran Ginosar, Avinoam Kolodny.
1 A Shifting Strategy for Dynamic Channel Assignment under Spatially Varying Demand Harish Rathi Advisors: Prof. Karen Daniels, Prof. Kavitha Chandra Center.
1 Evgeny Bolotin – ClubNet Nov 2003 Network on Chip (NoC) Evgeny Bolotin Supervisors: Israel Cidon, Ran Ginosar and Avinoam Kolodny ClubNet - November.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.
Power Aware Solutions for NoC Architecture Yaniv Ben-Itzhak Noc Seminar Winter 08.
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
UNIVERSITY OF JYVÄSKYLÄ Resource Discovery Using NeuroSearch Presentation for the Agora Center InBCT-seminar Mikko Vapa, researcher InBCT 3.2.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
IETF 90: VNF PERFORMANCE BENCHMARKING METHODOLOGY Contributors: Sarah Muhammad Durrani: Mike Chen:
Steady and Fair Rate Allocation for Rechargeable Sensors in Perpetual Sensor Networks Zizhan Zheng Authors: Kai-Wei Fan, Zizhan Zheng and Prasun Sinha.
Efficient Model Selection for Support Vector Machines
Particle Filtering in Network Tomography
Internet Traffic Engineering by Optimizing OSPF Weights Bernard Fortz (Universit é Libre de Bruxelles) Mikkel Thorup (AT&T Labs-Research) Presented by.
CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna.
Network Aware Resource Allocation in Distributed Clouds.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Optimization Problems - Optimization: In the real world, there are many problems (e.g. Traveling Salesman Problem, Playing Chess ) that have numerous possible.
EE 685 presentation Utility-Optimal Random-Access Control By Jang-Won Lee, Mung Chiang and A. Robert Calderbank.
Towards Efficient Large-Scale VPN Monitoring and Diagnosis under Operational Constraints Yao Zhao, Zhaosheng Zhu, Yan Chen, Northwestern University Dan.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Module R R RRR R RRRRR RR R R R R Access Regulation to Hot-Modules in Wormhole NoCs Isask’har (Zigi) Walter Supervised by: Israel Cidon, Ran Ginosar and.
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
Thermal Management in Datacenters Ayan Banerjee. Thermal Management using task placement Tasks: Requires a certain number of servers (cores) for a specified.
Incremental Run-time Application Mapping for Heterogeneous Network on Chip 2012 IEEE 14th International Conference on High Performance Computing and Communications.
Genetic algorithms: A Stochastic Approach for Improving the Current Cadastre Accuracies Anna Shnaidman Uri Shoshani Yerach Doytsher Mapping and Geo-Information.
DECOR: A Distributed Coordinated Resource Monitoring System Shan-Hsiang Shen Aditya Akella.
1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),
Task Mapping and Partition Allocation for Mixed-Criticality Real-Time Systems Domițian Tămaș-Selicean and Paul Pop Technical University of Denmark.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Genetic Algorithm(GA)
A MapReduced Based Hybrid Genetic Algorithm Using Island Approach for Solving Large Scale Time Dependent Vehicle Routing Problem Rohit Kondekar BT08CSE053.
MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
Balancing of Parallel Two-Sided Assembly Lines via a GA based Approach
A Study of Group-Tree Matching in Large Scale Group Communications
ISP and Egress Path Selection for Multihomed Networks
Spare Register Aware Prefetching for Graph Algorithms on GPUs
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Networked Real-Time Systems: Routing and Scheduling
Yiannis Andreopoulos et al. IEEE JSAC’06 November 2006
Boltzmann Machine (BM) (§6.4)
Parallel Programming in C with MPI and OpenMP
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Stochastic Methods.
Presentation transcript:

Module R R RRR R RRRRR RR R R R R Technion – Israel Institute of Technology The Era of Many-Module SoC: Revisiting the NoC Mapping Problem Isask’har (Zigi) Walter, Israel Cidon, Avinoam Kolodny, Daniel Sigalov December, 2009

SoC Revolution PE1 R PE2PE3 RR PE4 R PE5PE6 RR PE1PE2PE3 PE4PE5PE6 2 Bus-based system NoC-based system

SoC Evolution RR R R R R R R R 3

Processor Evolution CPU Cache Single Core CPU1 Cache Dual Core CPU2 Cache CPU1 Cache Quad Core CPU3 Cache CPU2 Cache CPU4 Cache 4

How would such chips be like? Most likely  Power still important  Highly parallel  IP reuse  Ease of design and verification The Era of Many-Module SoC High Certainty Totally unknown 5 Large number of modules NoC Interconnect Applications

Special purpose cores replace general purpose processors  Power considerations Future SoCs - Observation#1 General Purpose CPU Pre. Proc. DSP CPU Task 1 Task 2 MEM Task 3 Task 4 MEM Memory Task 1 Task 2 MEM Task 3 Task 4 MEM Memory Task 5 Task 5 GPU 6 Processing pipes are getting longer

Future SoCs - Observation#2 ? Large diversity All modules are unique Highly regular Classes of Replicated cores standard modules (DSP, HW accelerators, Cache banks, etc.) 7

Increased use of specialized cores  Pipes are getting longer Replication of processing elements How is the design flow affected?  This work – mapping of the NoC The Era of Many-Module SoC Observation#1 Observation#2 8

The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation Outline 9

Given  Traffic pattern(s) a set (or sets) of pair-wise bandwidth requirements and timing constraints  Routing  Topology Goal  Find efficient mapping of cores to tiles NoC Mapping PE4 PE1PE2 PE5 PE3 PE6 PE7PE8PE9PE4 PE1 PE2PE5 PE3 PE6PE7 PE8 PE9 PE4PE1 PE2 PE5 PE3PE6 PE7PE8 PE9 PE4 PE1 PE2 PE5 PE3 PE6 PE7 PE8 PE9 PE4 PE1PE2 PE5 PE3 PE6 PE7 PE8PE9 10

An important design step  Mapping affects power and performance! A difficult problem!  Often heuristic algorithms are used Common optimization goals  Minimize (dynamic) power  Minimize power + maximize performance  Minimize power subject to performance constraints Mapping Optimization 11

Typical modeling  Power and latency proportional to distance  Cost function: Modeling 12

Calculating Mapping Cost PE1PE2 PE4PE5 PE3 PE Mapping π 1 Mapping π 2 13

Motivation - Example #1 Optimal mapping ( π 1 ): PE1PE2PE3PE4 MEM 1 MEM 2 PE1MEM2 PE2PE3 PE4 MEM

Optimal mapping ( π 2 ): Let the mapping algorithm assign the flows! Motivation - Example #1 (cont.) PE1PE2PE3PE4 2*MEM PE1 MEM1 PE2 PE3PE4 MEM2PE1MEM1 PE2PE3 PE4 MEM2 Cost( π 1 )=9Cost( π 2 )=7 15

Motivation - Example #1 (cont.) PE1PE2PE3PE4 MEM 1 MEM PE1 MEM1 PE2 PE3PE4 MEM2 Cost( π 2 )=7 16 The mapping algorithm should be aware of replicated modules!

Pair-wise point-to-point requirements For example, in a 4-module system: Classic Performance Constraints PE1 2 PE2 1 PE3 11 PE4 PE3PE2PE1 17

PE1PE2PE3 PE4 Motivation - Example #2 Timing Requirement PEsStream ID 4PE1  PE2  PE3  PE4Stream 1 1PE2  PE4Stream 2 Stream 1 Stream 2 18

Example #2 – Pair-wise req. No feasible mapping! PE1PE2PE3 PE4 PE2. PE4. PE1 PE3PE2. PE4. PE1PE3 PE2. PE4 PE1PE3 2PE2 1PE3. 11PE4 PE3PE2PE1 19 Req=2Req=1

PE1PE2PE3 PE4 Application-Level Requirements RequirementPEsStream ID 4PE1  PE2  PE3  PE4Stream 1 1PE2  PE4Stream 2 PE2 PE4 PE1 PE3 Stream 1 Stream 2 A feasible mapping does exist! 20 Req=1Req=2 Req=1 It’s better to work with the application level requirements

Find efficient mappings by extending the formulation of the mapping problem  Adding degrees of freedom Degree of freedom #1  Leverage existence of replicated modules Degree of freedom #2  Replace p2p constraints with end-to-end, application-level requirements This Work 21

Modifying the Formulation (1) 22 Time Req. BWFlow 3100PE 1  DSP PE 2  DSP PE 2  SRAM PE 3  SRAM 2 ……… Time Req. BWFlow 3100PE 1  12200PE 2  15100PE 2  5100PE 3  ……… Leverage existence of replicated modules  Allow the mapping algorithm to allocate flows to the best replicated module

∞ ∞2 443 ∞313 ∞4773 ∞3∞24∞ 4∞723∞5 ∞∞2∞156∞ 773∞∞53∞1 ∞3∞3212∞31 Modifying the Formulation (2) 23 E2E Req. Stream’s PEsStream ID 23PE 1  PE 3  PE 9  PE 4  PE PE 5  PE 2  PE 3  PE 8  PE 7  PE 6  PE PE 5  PE 3  PE PE 7  PE 8  PE 2  PE 3 4 2PE 1  PE 2 5 ……… In this paper, for synthetic task graphs  Did so for a real application too P2P timing req. E2E timing req. Replace p2p constraints with end-to-end, application-level requirements

The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation Outline 24

Modern optimization heuristic  Good at combinatorial optimization problems Akin to evolutionary algorithms  Generation of new solutions is based on sampling and estimation Inherently a global search method  Reduced risk of getting trapped in a local minimum Cross Entropy Optimization 25

Given an initial parameter vector v=v 0, sample a random population of K solutions x 1,x 2,…,x k from the distribution given by f(x;v). Evaluate the costs S(xi),i=1,…,K. Using the ρK (0<ρ<1) elite (lowest cost) samples, obtain a new density function f(x;v) by calculating a new vector v via Maximum Likelihood (ML) estimation. Repeat steps 1-3 with the new vector v unless maximum number of iterations is reached or no improvement is obtained for a predefined number of iterations. Cross Entropy Optimization 1. Generate 10 random mappings: π 1, π 2, …, π Find 3 lowest cost mappings: π 2, π 5, π 7 3. Examine those 3 best mappings: A. For each tile, calculate the probability core PE i is mapped to that tile B. Update probabilities accordingly For example: 26

Prob(TileA  PE 1 )= Prob(TileA  PE 2 )= Prob(TileA  PE 3 )=Prob(TileA  PE 4 )=0.25 Prob(TileB  PE 1 )= Prob(TileB  PE 2 )= Prob(TileB  PE 3 )=Prob(TileB  PE 4 )=0.25 Prob(TileC  PE 1 )= Prob(TileC  PE 2 )= Prob(TileC  PE 3 )=Prob(TileC  PE 4 )=0.25 Prob(TileD  PE 1 )= Prob(TileD  PE 2 )= Prob(TileD  PE 3 )=Prob(TileD  PE 4 )=0.25 CE Example PE3 PE1PE2 PE4 Tile C Tile A Tile D Tile B π1π1 PE3 PE1 PE2 PE4 π2π2 PE3 PE1PE2 PE4 π3π3 PE3 PE1PE2 PE4 π4π4 PE3 PE1PE2 PE4 π5π5 PE3 PE1 PE2 PE4 π6π6 PE3PE1 PE2PE4 π7π7 PE3 PE1 PE2 PE4 π8π8 PE3 PE1 PE2 PE4 π9π9 PE3 PE1PE2 PE4 π 10 27

Prob(TileA  PE 1 )=1 Updating Probabilities PE1PE2 PE4PE3 PE1 PE2 PE4PE1PE2 PE4PE3 Prob(TileB  PE2)=2/3 Prob(TileB  PE4)=1/3 Prob(TileD  PE2)=1/3 Prob(TileD  PE3)=1/3 Prob(TileD  PE4)=1/3 Prob(TileC  PE3)=2/3 Prob(TileC  PE4)=1/3 π2π2 π5π5 π3π3 Following iteration uses these updates probabilities Gradually, probabilities converge to 0/1 Tile C Tile A Tile D Tile B 28

The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation Outline 29

Scenario  6x6 mesh NoC  Synthetic, randomized SoC Task graphs (and task-to-core mapping) Varying number of replicated modules Varying timing constraints (Real application in DATE10 paper) Compare with best cost of classic mapping  Averaging multiple runs Evaluation 30

“Class”: a group of identical PEs  Total number of replicated cores= {Number of classes}*{class size} Accounting for Replication 31 Cost Reduction [%] 1020 Total number of Replicated Modules [%]

SoCs with a pipeline data path and background P2P traffic  Varying pipeline slack  Different amounts of background constraints Application-Level Requirements 32 Cost Reduction [%] Pipeline Slack

We are going into the era of “Many module SoC” Extend the mapping to account for  Classes of replicated modules  Application-level requirements Meaningful power savings But mapping is an example  Routing? Task assignment? Link design? Topology selection? Conclusions and Future Work 33

Thank you! Questions? The Era of Many-Module SoC QNoC Research Group Group Research QNoC 34