CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th International Conference on Computer Design (ICCD) M. Fattah,

Slides:



Advertisements
Similar presentations
Min Song 1, Yanxiao Zhao 1, Jun Wang 1, E. K. Park 2 1 Old Dominion University, USA 2 University of Missouri at Kansas City, USA IEEE ICC 2009 A High Throughput.
Advertisements

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
An Analytical Model for Worst-case Reorder Buffer Size of Multi-path Minimal Routing NoCs Gaoming Du 1, Miao Li 1, Zhonghai Lu 2, Minglun Gao 1, Chunhua.
Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
A New Approach for Task Level Computational Resource Bi-Partitioning Gang Wang, Wenrui Gong, Ryan Kastner Express Lab, Dept. of ECE, University of California,
1 of 14 1 / 18 An Approach to Incremental Design of Distributed Embedded Systems Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer and.
Exposure In Wireless Ad-Hoc Sensor Networks S. Megerian, F. Koushanfar, G. Qu, G. Veltri, M. Potkonjak ACM SIG MOBILE 2001 (Mobicom) Journal version: S.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
L i a b l eh kC o m p u t i n gL a b o r a t o r y On Effective TSV Repair for 3D- Stacked ICs Li Jiang †, Qiang Xu † and Bill Eklow § † CUhk REliable.
SOS: A Safe, Ordered, and Speedy Emergency Navigation Algorithm in Wireless Sensor Networks Andong Zhan ∗ †, Fan Wu ∗, Guihai Chen ∗ ∗ Shanghai Key Laboratory.
Sensor Positioning in Wireless Ad-hoc Sensor Networks Using Multidimensional Scaling Xiang Ji and Hongyuan Zha Dept. of Computer Science and Engineering,
Efficient Gathering of Correlated Data in Sensor Networks
Network Aware Resource Allocation in Distributed Clouds.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
Distributed Anomaly Detection in Wireless Sensor Networks Ksutharshan Rajasegarar, Christopher Leckie, Marimutha Palaniswami, James C. Bezdek IEEE ICCS2006(Institutions.
1 Multicast Algorithms for Multi- Channel Wireless Mesh Networks Guokai Zeng, Bo Wang, Yong Ding, Li Xiao, Matt Mutka Michigan State University ICNP 2007.
MAP: Multi-Auctioneer Progressive Auction in Dynamic Spectrum Access Lin Gao, Youyun Xu, Xinbing Wang Shanghai Jiaotong University.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Logical Topology Design and Interface Assignment for Multi- Channel Wireless Mesh Networks A. Hamed Mohsenian Rad Vincent W.S. Wong The University of British.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
FAR: Face-Aware Routing for Mobicast in Large-Scale Sensor Networks QINGFENG HUANG Palo Alto Research Center (PARC) Inc. and SANGEETA BHATTACHARYA, CHENYANG.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Collision-free Time Slot Reuse in Multi-hop Wireless Sensor Networks
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
1 11 Channel Assignment for Maximum Throughput in Multi-Channel Access Point Networks Xiang Luo, Raj Iyengar and Koushik Kar Rensselaer Polytechnic Institute.
Bounded relay hop mobile data gathering in wireless sensor networks
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
Smart Hill Climbing for Agile Dynamic Mapping in Many- Core Systems Design Automation Conference(DAC), pp.1-6, May 29-June , Austin, TX, USA M. Fattah,
1 GPS-Free-Free Positioning System for Wireless Sensor Networks Farid Benbadis, Timur Friedman, Marcelo Dias de Amorim, and Serge Fdida IEEE WCCN 2005.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Energy-Efficient, Application-Aware Medium Access for Sensor Networks Venkatesh Rajenfran, J. J. Garcia-Luna-Aceves, and Katia Obraczka Computer Engineering.
Incremental Run-time Application Mapping for Heterogeneous Network on Chip 2012 IEEE 14th International Conference on High Performance Computing and Communications.
Fair and Efficient multihop Scheduling Algorithm for IEEE BWA Systems Daehyon Kim and Aura Ganz International Conference on Broadband Networks 2005.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Pages Tianyi Wang, Gang Quan, Shangping.
1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
SERENA: SchEduling RoutEr Nodes Activity in wireless ad hoc and sensor networks Pascale Minet and Saoucene Mahfoudh INRIA, Rocquencourt Le Chesnay.
Efficient Geographic Routing in Multihop Wireless Networks Seungjoon Lee*, Bobby Bhattacharjee*, and Suman Banerjee** *Department of Computer Science University.
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
Slack Analysis in the System Design Loop Girish VenkataramaniCarnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University.
ProgessFace: An Algorithm to Improve Routing Efficiency of GPSR-like Routing Protocols in Wireless Ad Hoc Networks Chia-Hung Lin, Shiao-An Yuan, Shih-Wei.
Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.
VADD: Vehicle-Assisted Data Delivery in Vehicular Ad Hoc Networks Zhao, J.; Cao, G. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 鄭宇辰
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
2010 IEEE Global Telecommunications Conference (GLOBECOM 2010)
Architecture and Algorithms for an IEEE 802
Mesh-based Geocast Routing Protocols in an Ad Hoc Network
Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University
Delay Optimization using SOP Balancing
IEEE Student Paper Contest
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Delay Optimization using SOP Balancing
Presentation transcript:

CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th International Conference on Computer Design (ICCD) M. Fattah, M. Ramirez, M. Daneshtalab, P. Liljeberg, J. Plosila 1

Outline  Introduction  Mapping Problem and Evaluation Metrics  Contiguous Neighborhood Allocation Mapping  Experimental Setup  Results and Analysis  Conclusion 2

Outline  Introduction  Mapping Problem and Evaluation Metrics  Contiguous Neighborhood Allocation Mapping  Experimental Setup  Results and Analysis  Conclusion 3

Introduction  An efficient algorithm for run-time application mapping problem  Three novel contributions  First node selection  First task selection  Map the rest of tasks onto nearest neighborhood 4

Outline  Introduction  Mapping Problem and Evaluation Metrics  Contiguous Neighborhood Allocation Mapping  Experimental Setup  Results and Analysis  Conclusion 5

Mapping Problem and Evaluation Metrics  Applications  A p =TG(T, E) t i T e i,j E  Communication platform  AG(Ñ, L)  ñ i,j ={(r i,j, pe i,j )| ñ i,j Ñ, 0≤ i<M, 0≤ j<N}  Manhattan Distance : MD(ñ i,j, ñ m,n ) = (|i - m| + |j - n|)  Mapping function  map: T → Ñ, s.t. map(t i ) = ñ m,n ; ∀ t i ∈ T, ∃ n m,n ∈ Ñ 6

Evaluation Metrics  Packet latency  Average Manhattan Distance  Average Weighted Manhattan Distance 7

Evaluation Metrics (cont.)  Mapped Region Dispersion  Internal Congestion Ratio (ICR)  The number of edges using the same channel with respect to its total number of edges 8

Outline  Introduction  Mapping Problem and Evaluation Metrics  Contiguous Neighborhood Allocation Mapping  Experimental Setup  Results and Analysis  Conclusion 9

Contiguous Neighborhood Allocation Mapping (CoNA)  Three steps  First node selection  Choosing the first task of the application  Contiguous neighborhood allocation 10

CoNA (cont.) 11

CoNA (cont.)  First node selection  The nearest node to the central manager among the nodes with the largest number of available neighbors 12

CoNA (cont.)  Choosing the first task of the application  Selects the task with the largest number of edges  The most intensive communication 13

CoNA (cont.)  Contiguous neighborhood allocation  Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t 1, t 4 ), (t 2, t 4 ), (t 5, t 4 ), (t 0, t 1 ), (t 3, t 2 )}  Select the one which fits in the smallest square with the first node 14

CoNA (cont.)  Contiguous neighborhood allocation  Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t 1, t 4 ), (t 2, t 4 ), (t 5, t 4 ), (t 0, t 1 ), (t 3, t 2 )}  Select the one which fits in the smallest square with the first node 15

CoNA (cont.)  Contiguous neighborhood allocation  Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t 1, t 4 ), (t 2, t 4 ), (t 5, t 4 ), (t 0, t 1 ), (t 3, t 2 )}  Select the one which fits in the smallest square with the first node 16

CoNA (cont.)  Contiguous neighborhood allocation  Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t 1, t 4 ), (t 2, t 4 ), (t 5, t 4 ), (t 0, t 1 ), (t 3, t 2 )}  Select the one which fits in the smallest square with the first node 17

CoNA (cont.)  Contiguous neighborhood allocation  Task graph is traversed in the breadth-first order, paired with their predecessors is: {(t 1, t 4 ), (t 2, t 4 ), (t 5, t 4 ), (t 0, t 1 ), (t 3, t 2 )}  Select the one which fits in the smallest square with the first node 18

CoNA (cont.) 19

Outline  Introduction  Mapping Problem and Evaluation Metrics  Contiguous Neighborhood Allocation Mapping  Experimental Setup  Results and Analysis  Conclusion 20

Experimental Setup  NoC platform  Plasma processor  Local memory  DMA controller  Tra-NI interface  Central manager (CM)  The maximum number of applications that could be injected per second into the system is denoted as λ full 21

Experimental Setup (cont.)  Simulation  To extract packet latency  FPGA  To investigate CoNA time complexity  Xilinx ML605 22

Experimental Setup (cont.)  Application set  Task graphs are randomly generated (set1) using the Task graph generator  Number of nodes : 4 – 11  Weight of edges : 4 – 16 flits  The weights of applications edges are equally multiplied by 16 (set16) 23

Outline  Introduction  Mapping Problem and Evaluation Metrics  Contiguous Neighborhood Allocation Mapping  Experimental Setup  Results and Analysis  Conclusion 24

Results and Analysis  Packet latency evaluation  Time complexity evaluation 25

Packet latency evaluation 26

Packet latency evaluation (cont.) 27

Packet latency evaluation (cont.) 28

Packet latency evaluation (cont.) 29

Time complexity evaluation 30

Time complexity evaluation (cont.) 31

Outline  Introduction  Mapping Problem and Evaluation Metrics  Contiguous Neighborhood Allocation Mapping  Experimental Setup  Results and Analysis  Conclusion 32

Conclusion  An efficient run-time task allocation is proposed  Reduce internal and external congestions  Three novel contributions 33

Thank you ! 34