Measuring Network Performance of Multi-Core Multi-Cluster (MCMCA) Norhazlina Hamid Supervisor: R J Walters and G B Wills PUBLIC.

Slides:



Advertisements
Similar presentations
QuT: A Low-Power Optical Network-on-chip
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Analytical Modeling and Evaluation of On- Chip Interconnects Using Network Calculus M. BAkhouya, S. Suboh, J. Gaber, T. El-Ghazawi NOCS 2009, May 10-13,
Reporter: Bo-Yi Shiu Date: 2011/05/27 Virtual Point-to-Point Connections for NoCs Mehdi Modarressi, Arash Tavakkol, and Hamid Sarbazi- Azad IEEE TRANSACTIONS.
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
THE TITLE OF YOUR PAPER Your Name Communication Networks Laboratory School of Engineering Science Simon Fraser University.
1 The Designs and Analysis of a Scalable Optical Packet Switching Architecture Speaker: Chia-Wei Tuan Adviser: Prof. Ho-Ting Wu 3/4/2009.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
COMPUTER NETWORKS LAB 3: HUBS AND SWITCHES
Chapter 14 Network Design and Management. Introduction Properly designing a computer network is a difficult task. It requires planning and analysis, feasibility.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
A Distributed Scheduling Algorithm for Real-time (D-SAR) Industrial Wireless Sensor and Actuator Networks By Kiana Karimpour.
1 Scaling Collective Multicast Fat-tree Networks Sameer Kumar Parallel Programming Laboratory University Of Illinois at Urbana Champaign ICPADS ’ 04.
Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
MPI and OFA Divergent interests? Dan Caldwell, VP WW Channel Sales Scali, Inc.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
IEEE Globecom 2010 Tan Le Yong Liu Department of Electrical and Computer Engineering Polytechnic Institute of NYU Opportunistic Overlay Multicast in Wireless.
1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Packet Dispersion in IEEE Wireless Networks Mingzhe Li, Mark Claypool and Bob Kinicki WPI Computer Science Department Worcester, MA 01609
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Test Architecture Design and Optimization for Three- Dimensional SoCs Li Jiang, Lin Huang and Qiang Xu CUhk Reliable Computing Laboratry Department of.
Dual-Region Location Management for Mobile Ad Hoc Networks Yinan Li, Ing-ray Chen, Ding-chau Wang Presented by Youyou Cao.
A Method for Distributed Computation of Semi-Optimal Multicast Tree in MANET Eiichi Takashima, Yoshihiro Murata, Naoki Shibata*, Keiichi Yasumoto, and.
Performance Analysis of a JPEG Encoder Mapped To a Virtual MPSoC-NoC Architecture Using TLM 林孟諭 Dept. of Electrical Engineering National Cheng Kung.
Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.
On the Impact of Clustering on Measurement Reduction May 14 th, D. Saucez, B. Donnet, O. Bonaventure Thanks to P. François.
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
1 Routing and Resilience in Future Optical Broadband Telecommunications Networks 21 st January 2004 Andrew S. T. Lee Supervisor: Dr. David Harle Broadband.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Early Detection of DDoS Attacks against SDN Controllers
Time-Dependent Dynamics in Networked Sensing and Control Justin R. Hartman Michael S. Branicky Vincenzo Liberatore.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
A Two-phase Execution Engine of Reduce Tasks In Hadoop MapReduce XiaohongZhang*GuoweiWang* ZijingYang*YangDing School of Computer Science and Technology.
By Miguel A. Erazo Advisor: Jason Liu March 2009.
A Cluster Based On-demand Multi- Channel MAC Protocol for Wireless Multimedia Sensor Network Cheng Li1, Pu Wang1, Hsiao-Hwa Chen2, and Mohsen Guizani3.
Investigating the Prefix-level Characteristics A Case Study in an IPv6 Network Department of Computer Science and Information Engineering, National Cheng.
PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS Wim Heirman, Iñigo Artundo, Joni Dambre, Christof Debaes, Pham.
Introduction Computer networks: – definition – computer networks from the perspectives of users and designers – Evaluation criteria – Some concepts: –
A Protocol for Tracking Mobile Targets using Sensor Networks H. Yang and B. Sikdar Department of Electrical, Computer and Systems Engineering Rensselaer.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
Wireless Access and Networking Technology (WANT) Lab. An Efficient Data Aggregation Approach for Large Scale Wireless Sensor Networks Globecom 2010 Lutful.
Scalable Congestion Control Protocol based on SDN in Data Center Networks Speaker : Bo-Han Hua Professor : Dr. Kai-Wei Ke Date : 2016/04/08 1.
With the recent rise in cloud computing, applications are routinely accessing and interacting with data on remote resources. As data sizes become increasingly.
C. Murad Özsert Intel's Tera Scale Processor Architecture.
Enhancements for Voltaire’s InfiniBand simulator
Using Packet Information for Efficient Communication in NoCs
School of Electrical Engineering and Computer Science
Presentation transcript:

Measuring Network Performance of Multi-Core Multi-Cluster (MCMCA) Norhazlina Hamid Supervisor: R J Walters and G B Wills PUBLIC SERVICE DEPARTMENT OF MALAYSIA

Outline  Introduction and Definition  Motivation  Related Work  Research Objectives  The Architecture  Methodology  Conclusion and Progress Work 2

Introduction  The emergence of High Performance computing (HPC) that includes Cluster computing has improved the availability of powerful computers and high speed network technologies.  The implementation of multi-core cluster as a platform of high performance network will supports high availability and enables scalability of the networks. 3

Definition  Multi-Core Multi-Cluster Architecture (MCMCA) is a collection of multiple multi- core cluster, interconnect by a network.  MCMCA are built from:  Cluster computing is a collection of computer, interconnect by a network which work together to form a single computer 4

Definition  A multi-core cluster is a cluster where some or all the nodes in the cluster have multi- core processors  Multi-cluster architecture is a multiple cluster system that is connected via the cluster interconnection networks 5 Cluster computi ng Multi- core process or Multi- cluster MCMCA

Motivation  Existing models do not address the potential performance of the interconnection networks within large-scale multi-core clusters.  A high communication latency of interconnection network can dramatically reduce the efficiency of the cluster system.  Scaling up by adding more processors to each cluster increase the processing power. 6

Related Works  Furhad et al. (2013) proposed the EnMesh topology to address the issue of communication delays that occur with long distance communication within the remote nodes of a NoC.  Mohsen et al. (2013) proposed a new mapping technique to assign parallel processes into processing cores of a multi- core cluster which based on queuing network modelling of a limited-size cluster. 7

Objectives  To propose a new architecture for multi- core multi-cluster.  To investigate the performance of interconnection networks of multi-core multi-cluster architecture.  To demonstrate the feasibility of the proposed architecture by computer simulation experiment and measurements. 8

The Architecture of MCMCA 9

Communication Network  There are five communication networks in MCMCA 1.IntrA-chip Communication network (AC) 2.IntEr-chip Communication network (EC) 3.IntrA-Cluster Network (ACN) 4.IntEr-Cluster Network (ECN) 5.Multi-Cluster Network (MCN) 10

IntrA-chip Communication network (AC) 11 Message passing between two processor cores on the same chip

IntEr-chip Communication network (EC) 12 Message passing across processors on different chips but within a node

IntrA-Cluster Network (ACN) 13 Message passing between processors on different nodes but within the same cluster

Communication Network 14 Message passing between clusters

Methodology  Computer simulation experiments using OMNeT network simulation tool.  Focus on network latency to measure the performance.  Early stage experiments are based on a single-core multi-cluster architecture.  The experiments have been performed using given configuration and parameters of previous paper to create baseline results. 15

Experiment Results  Results from the experiments have shown that as the traffic rate increases, the average communication latency increases following the assumptions that the messages are delayed by having to wait for resources before traversing into a network.  The results confirm that the simulation model is a good basis to measure the communication latency for a large-scale cluster, and can be extended to multi-core multi-cluster architecture. 16

Methodology  Network Latency (∑NL) are obtained from: – Total Network Latency of Message Passing Within Cluster (L  ) – Total Network Latency of Message Passing Between Clusters (L  ) – Probability of outgoing message in a cluster (P) – Probability of message in a cluster (1-P)  Analytical Formula: ∑NL = ( L  x (1-P)) + ( L  x P) 17

Methodology  Both Network Latency (L  and L  ) are obtained from: – Waiting time at the source node (W) – Delay while traversing the network(D) – Time for each packet to reach its destination node (E) – Waiting time at transfer switch between clusters ( Wts)  Analytical Formula: L  = W  + D  + E  L  = W  + D  + E  + 2Wts 18

Conclusion  Presenting a new architecture for measuring the performance of interconnection networks in MCMCA.  Progress work: 1.Developing a MCMCA simulation model. 2.Conduct a simulation experiments with different network technology and bandwidth. 3.Developing analytical equation for the architecture. 19

Published Paper 1.Hamid, Norhazlina, Walters, Robert John and Wills, Gary Brian (2014) Performance evaluation of multi-core multi-cluster architecture. In, Emerging Software as a Service and Analytics, Barcelona, ES, Apr Scitepress9pp, Performance evaluation of multi-core multi-cluster architecture. 2.Hamid, Norhazlina, Walters, Robert John and Wills, Gary Brian, " An Architecture for Measuring Network Performance in Multi- Core Multi-Cluster Architecture (MCMCA). In, Euro-Asia Conference on Computational Intelligence and Communication Networks, Antalya, Turkey, Apr 2014." 3.Hamid, Norhazlina, Walters, Robert John and Wills, Gary Brian, " An Architecture for Measuring Network Performance in Multi- Core Multi-Cluster Architecture (MCMCA)," International Journal of Computer Theory and Engineering vol. 7, no. 1, pp , February

Q & A Thank you. 21

22 Queuing in the network

23 Illustration of message passing in simulation model

Results & Discussion  Experiments of single-core multi-cluster simulation model have been performed with 8-single-core cluster.  Two case have been examined as follow: 24 CASE 1CASE 2 Message Length (M) = 32 flitsMessage Length (M) = 64 flits Flit Length (F) = 256 and 512 bytes

Results & Discussion  The results obtained from the single-core multi-cluster architecture closely match the results from the published model when compared. 25

Related Works  Wu et al. (2012) present a new analytical model for prediction of communication networks performance in multi-cluster systems in the presence of the spatio- temporal bursty traffic.  Wang et al. (2012) extended a multithreaded PDES simulator to support multi-core cluster and investigate the impact of communication latency in multi- core cluster environment. 26