Performance Analysis of Daisy- Chained CPUs Based On Modeling Krzysztof Korcyl, Jagiellonian University, Krakow Radoslaw Trebacz Jagiellonian University,

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Chabot College Chapter 2 Review Questions Semester IIIELEC Semester III ELEC
1 CONGESTION CONTROL. 2 Congestion Control When one part of the subnet (e.g. one or more routers in an area) becomes overloaded, congestion results. Because.
CS-334: Computer Architecture
Emulatore di Protocolli di Routing per reti Ad-hoc Alessandra Giovanardi DI – Università di Ferrara Pattern Project Area 3: Problematiche di instradamento.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
EE 4272Spring, 2003 Chapter 10 Packet Switching Packet Switching Principles  Switching Techniques  Packet Size  Comparison of Circuit Switching & Packet.
CSCI 4550/8556 Computer Networks Comer, Chapter 11: Extending LANs: Fiber Modems, Repeaters, Bridges and Switches.
Chapter 14 LAN Systems Ethernet (CSMA/CD) ALOHA Slotted ALOHA CSMA
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
The Publisher-Subscriber Interface Timm Morten Steinbeck, KIP, University Heidelberg Timm Morten Steinbeck Technical Computer Science Kirchhoff Institute.
EE 4272Spring, 2003 Chapter 14 LAN Systems Ethernet (CSMA/CD)  ALOHA  Slotted ALOHA  CSMA  CSMA/CD Token Ring /FDDI Fiber Channel  Fiber Channel Protocol.
Evaluating System Performance in Gigabit Networks King Fahd University of Petroleum and Minerals (KFUPM) INFORMATION AND COMPUTER SCIENCE DEPARTMENT Dr.
Cs/ee 143 Communication Networks Chapter 3 Ethernet Text: Walrand & Parakh, 2010 Steven Low CMS, EE, Caltech.
Can Google Route? Building a High-Speed Switch from Commodity Hardware Guido Appenzeller, Matthew Holliman Q2/2002.
Time measurement of network data transfer R. Fantechi, G. Lamanna 25/5/2011.
Data Communications and Networking
Chapter 13 WAN Technologies and Routing. LAN Limitations Local Area Network (LAN) spans a single building or campus. Bridged LAN is not considered a Wide.
Modeling of the architectural studies for the PANDA DAT system K. Korcyl 1,2 W. Kuehn 3, J. Otwinowski 1, P. Salabura 1, L. Schmitt 4 1 Jagiellonian University,Krakow,
Study on Power Saving for Cellular Digital Packet Data over a Random Error/Loss Channel Huei-Wen Ferng, Ph.D. Assistant Professor Department of Computer.
Internet Addresses. Universal Identifiers Universal Communication Service - Communication system which allows any host to communicate with any other host.
Routing Protocol Evaluation David Holmer
Time Parallel Simulations II ATM Multiplexers and G/G/1 Queues.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Role and Mechanism of Queue Internet Engineering.
Sami Al-wakeel 1 Data Transmission and Computer Networks The Switching Networks.
 Circuit Switching  Packet Switching  Message Switching WCB/McGraw-Hill  The McGraw-Hill Companies, Inc., 1998.
Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
ICOM 6115©Manuel Rodriguez-Martinez ICOM 6115 – Computer Networks and the WWW Manuel Rodriguez-Martinez, Ph.D. Lecture 7.
PRoPHET+: An Adaptive PRoPHET- Based Routing Protocol for Opportunistic Network Ting-Kai Huang, Chia-Keng Lee and Ling-Jyh Chen.
Packet switching network Data is divided into packets. Transfer of information as payload in data packets Packets undergo random delays & possible loss.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
An Energy Efficient MAC Protocol for Wireless LANs, E.-S. Jung and N.H. Vaidya, INFOCOM 2002, June 2002 吳豐州.
Kraków4FutureDaQ Institute of Physics & Nowoczesna Elektronika P.Salabura,A.Misiak,S.Kistryn,R.Tębacz,K.Korcyl & M.Kajetanowicz Discrete event simulations.
Modeling PANDA TDAQ system Jacek Otwinowski Krzysztof Korcyl Radoslaw Trebacz Jagiellonian University - Krakow.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
Unit 1 Lecture 4.
Efficient Gigabit Ethernet Switch Models for Large-Scale Simulation Dong (Kevin) Jin David Nicol Matthew Caesar University of Illinois.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
CSMA/CA Simulation  Course Name: Networking Level(UG/PG): UG  Author(s) : Amitendu Panja, Veedhi Desai  Mentor: Aruna Adil *The contents in this ppt.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Data Communication Networks Lec 13 and 14. Network Core- Packet Switching.
Artur BarczykRT2003, High Rate Event Building with Gigabit Ethernet Introduction Transport protocols Methods to enhance link utilisation Test.
Computer Model Simulation STEPHEN GOWDY FNAL 30th March 2015 Computing Model Simulation 1.
1 The Latency/Bandwidth Tradeoff in Gigabit Networks UBI 527 Data Communications Ozan TEKDUR , Fall.
Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.
High Rate Event Building with Gigabit Ethernet
Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.
CS 5565 Network Architecture and Protocols
Buffer Management in a Switch
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Queuing Theory Queuing Theory.
CS 286 Computer Organization and Architecture
CONGESTION CONTROL.
BIC 10503: COMPUTER ARCHITECTURE
Yong-xiang Zhao, Chang-jia Chen, Yishuai Chen
Module 5: CPU Scheduling
3: CPU Scheduling Basic Concepts Scheduling Criteria
Event Building With Smart NICs
Data Communication Networks
CS 6290 Many-core & Interconnect
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Module 5: CPU Scheduling
Module 5: CPU Scheduling
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Performance Analysis of Daisy- Chained CPUs Based On Modeling Krzysztof Korcyl, Jagiellonian University, Krakow Radoslaw Trebacz Jagiellonian University, Krakow Second FutureDaq workshop, GSI,

Sink Node 1 Node N Node 2 Node N-1 The chain FIFO SOURCE Poisson Uniform Fixed

Compute Engine (fixed processing time, Size reduction for processed data) GBit In GBit Out FIFO Busy FIFO Models of the nodes Selector 100Gigabit processingCount Processor FIFO 10Gigabit 10 11

Operation Data packets are produced using one of available distributions of inter packet times (Poisson Uniform or Fixed) and stored into FIFO. If the Source output line is free and there is a packet in the FIFO, the packet is sent immediately into the chain Between nodes packets are transferred with gigabit per second speed (no Ethernet framing, neither check for line packet loss nor transmission erros). Packets arriving to the selector are stored in the input FIFO if there a space, otherwise they are dropped. If the selector’s local transfer medium is free and there is a packet in the input FIFO it is tested whether it is raw or processed data. The packet with raw data is sent to the local computing resource via the local transfer medium, if the resource has credit to absord the packet (every packet absorbed by the computing resource decrements it’s credit). If the computing resource run out of credits, the packet is sent to the output FIFO Packets with processed data arriving to a selector are sent to the output FIFO if the selector’s local transfer medium is free.

Operation - cont Packet with raw data arriving to the computer output FIFO, decrements credit count of the resource and is sent off to the resource with 100 Gbit speed. Computing resource starts procesing data when transfer over link finishes. After processing time the raw data is converted into processed data and it’s size is reduced Processed data are returned back to the selector with 100 Gbit speed and stored in the selector’s computer input FIFO. If the selector’s local transfer medium is free, the processed data packet is sent to the selector’s output FIFO. Transfer of the processed data has higher priority over sending the raw data however, the currently running transfer is not interrupted. When the processed data arrive to the selector’s output FIFO, the resource’s credit count is incremented. If the output line is free, the packet from the selector’s output FIFO is sent to the line immediately.

Parameters Raw data size: 1500 bytes; Processed: raw size * 0.5 Processing time: 10, 12, 24, 36, 48, 60, 72, 84, 96, 108 µs Selector’s FIFO size: 10 packets on chain input and output, 1 packet for computer input and output Delay time: 12, 24, 32 µs Ethernet transfer speed: 1ns/bit: 1500B = 12 µs Number of nodes 10 [us]12[us]24 [us]36 [us]84[us]96[us]108[us] 12[us] [us] [us] Minimal Number of processors Processing time Delay time

Minimal chain length to process all

Average CPU usage

Minimal chain length to process all

Non-processed data vs chain length

CPU utilization

Observations Link between source and the first selector derandomizes packets arrival. Lack of proper derandomization results in longer chain (poorer CPU utilization) to absorb bursts – smaller message size with fixed procesing time allows for bursts

Observations The CPUs located far from the source receive majority of processed packets instead of raw data. Instead of processing they relay processed data from input to the output port. The closer to the sink the poorer utilization of the CPU resource.

Possible modifications – for evaluation with modeling Add more sources and more sinks along the chain Resend non-processed data again into the chain Use more efficiently buffering on the selector nodes – keep data on board and delay decision on forwarding

About the model Runs with Ptolemy Classic All nodes (Source, Selector, Processor and Sink) coded within 200 lines of C++ 100k events simulated in 3 mins on 1.5 GHz Pentium 4