Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS.

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

2. Computer Clusters for Scalable Parallel Computing
Last update: August 9, 2002 CodeTest Embedded Software Verification Tools By Advanced Microsystems Corporation.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
Java Parallel Processing Framework. Presentation Road Map What is Java Parallel Processing Framework JPPF Features JPPF Requirements JPPF Topology JPPF.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Socket Programming.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/27 A Control Software for the ALICE High Level Trigger Timm.
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Timm M. Steinbeck - Kirchhoff Institute of Physics - University Heidelberg 1 Timm M. Steinbeck HLT Data Transport Framework.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
GSI Operating Software – Migration OpenVMS to Linux Ralf Huhmann PCaPAC 2008 October 20, 2008.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Software Framework for Teleoperated Vehicles Team Eye-Create ECE 4007 L01 Karishma Jiva Ali Benquassmi Safayet Ahmed Armaghan Mahmud Khin Lay Nwe.
Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS6: Device Management 6.1. Principles of I/O.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Architecting Web Services Unit – II – PART - III.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
C.Combaret, L.Mirabito Lab & beamtest DAQ with XDAQ tools.
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
A Comparative Study of the Linux and Windows Device Driver Architectures with a focus on IEEE1394 (high speed serial bus) drivers Melekam Tsegaye
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
Towards a Homogeneous Software Environment for DAQ Applications Luciano Orsini Johannes Gutleber CERN EP/CMD.
“DECISION” PROJECT “DECISION” PROJECT INTEGRATION PLATFORM CORBA PROTOTYPE CAST J. BLACHON & NGUYEN G.T. INRIA Rhône-Alpes June 10th, 1999.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
D ata A cquisition B ackbone C ore DABCDABC , Huelva J.Adamczewski, H.G.Essel, N.Kurz, S.Linev 1 Work.
Dynamic configuration of the CMS Data Acquisition Cluster Hannes Sakulin, CERN/PH On behalf of the CMS DAQ group Part 1: Configuring the CMS DAQ Cluster.
7. CBM collaboration meetingXDAQ evaluation - J.Adamczewski1.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Processes Introduction to Operating Systems: Module 3.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
IT/EE Palaver FAIR DAQ - J.Adamczewski, S.Linev1.
1 Object Oriented Logic Programming as an Agent Building Infrastructure Oct 12, 2002 Copyright © 2002, Paul Tarau Paul Tarau University of North Texas.
DABCDABC D ata A cquisition B ackbone C ore J.Adamczewski, H.G.Essel, N.Kurz, S.Linev 1 Work supported by EU RP6 project.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CODA Graham Heyes Computer Center Director Data Acquisition Support group leader.
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
Online Software November 10, 2009 Infrastructure Overview Luciano Orsini, Roland Moser Invited Talk at SuperB ETD-Online Status Review.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Software Architecture of Sensors. Hardware - Sensor Nodes Sensing: sensor --a transducer that converts a physical, chemical, or biological parameter into.
The Control and Hardware Monitoring System of the CMS Level-1 Trigger Ildefons Magrans, Computing and Software for Experiments I IEEE Nuclear Science Symposium,
CHEP 2010, October 2010, Taipei, Taiwan 1 18 th International Conference on Computing in High Energy and Nuclear Physics This research project has.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Introduction to Operating Systems Concepts
Module 12: I/O Systems I/O hardware Application I/O Interface
J.M. Landgraf, M.J. LeVine, A. Ljubicic, Jr., M.W. Schulz
Constructing a system with multiple computers or processors
I/O Systems I/O Hardware Application I/O Interface
Operating System Concepts
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 13: I/O Systems.
Task Manager & Profile Interface
Module 12: I/O Systems I/O hardwared Application I/O Interface
Cluster Computers.
Presentation transcript:

Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS Collaboration CERN, 1211 Geneva 23, Switzerland

2 The Issue 1988 The biggest problem with creating distributed computing systems is devising a method of intercomputer communication that is reliable, fast and simple. J.E. Tomayko, NASA CR , p.228, Mar High-speed networks […] can obtain communication speeds close to those of supercomputers, but realizing this potential is a challenging problem. H. Bal, ACM Op Sys Rev, p. 79, Oct 2000

3 The Approach invest in alternative communication paradigms optimise communication libraries Do not… Lightweight framework for homogeneous communication Configure with low-level communication libraries Plug-in application components homogeneous subsystem interface design support Provide architectural software support

4 Architectural Software Support Architecture support comprises –a processing model –subsystem addressing –configuration and control –Application Programmer Interface requirements Everything that is needed to build and operate a Distributed application

5 Motivation In large scale data acquisition systems we have to cope with –Long operational lifetimes (10-15 yrs) –Modifications due to generation jumps (networking, processing) –Deployance of one application in various different environments –Bridging of hardware/software performance gaps From the special case we can extrapolate to general cluster based systems –Search engines, document retrieval systems –Plant control systems –Medical imaging networks in hospitals Available tools don´t match the requirements

6 HDM/FPGAHDM/IOP Architecture Basis: I 2 O A specification for hardware and operating system independent device driver framework Targeted at collaboration between... Messaging Layer Host and Intelligent devices Intelligent device intercommunication PCI bus UNIX - OSMWindows - OSM

7 I 2 O IOP Environment Inbound/Outbound queue (pass frame pointers, Zcopy) Homogeneous frame format Event driven processing Uniform hardware access API IRQ bar ( ) Network HDM, framework foo( ) Inbound outbound

8 I 2 O Message Frame Used to implement an active message model MessageSize MessageFlagsVersionOffset TargetAddressInitiatorAddressFunction (= FFh) InitiatorContext TransactionContext XFunctionCodeOrganizationID PrivatePayload = function parameters PrivatePayload Standard Frame Private Frame Extension Assigned by application and returned in reply (cookie) Assigned by message layer. Used for routing back reply

9 I 2 O Messaging A Message frame contains two addresses –initiatorTid = where the message comes from –destinationTid = to which DDM/ISM it shall go Message is associated with a handler function –Predefined Functions for I 2 O messages –Private frame extension for application specific messages Message length limited to 265 KB. Frame should only contain control information –Message data should go into S catter- G ather L ists I 2 O frame byte order is little endian

10 Peer and Peer 2 Peer Operations Peer Operation uses the queue pair on one PCI segment Peer-to-Peer commands for network communication Executive Peer Transport Agent Executive Peer Transport Agent Peer Transport DDM Messaging Layer Executive Messaging Layer Device Driver Module Non-I 2 O messages I 2 O message frames

11 I 2 O Peer Operation for Clusters Application component device Processing node IOP Controller node host Homogeneous communication –frameSend for local, remote, host –single addressing scheme (Tid) Application framework Executive Messaging Layer Peer Transport Agent Messaging Layer Executive Peer Transport Agent   ‚  „     Peer Transport Application I 2 O Message Frames

12 TargetAddr ClassId Instance Dispatcher Applications are I 2 O Classes in XDAQ they are equivalent to C++ classes Listener DDmAdapter UtilAdapter UserAdapter Application Each class exposes an interface that is implemented by the application

13 Polling Peer Transport Agent + low OS service overhead - executive uses CPU continuously - no blocking PTs Peer Transport Configurations PTA TCP Myri DLPI FIFO PTA TCP Myri DLPI FIFO Thread per Peer Transport - higher OS service overhead + no CPU monopolisation + allows integration with other software

14 I 2 O for Cluster Configuration executive tasks RUIO (IOP480) VxWorks PPC (MVME2306) VxWorks, Linux Workstation Intel Linux, Sparc Solaris

15 Boot Executives on each node in the cluster wait for I 2 O configuration messages Configuration and Control can be done through –Native I 2 O messages –XML/HTTP mapping Zzz.. zzz … zzz.. Parameter set/get is Also done through I 2 O/XML

16 I 2 O Configuration Commands Where (e.g. IOP 34) ExecSysTabSet How (e.g TCP, DLPI, Myrinet) Who (e.g RU1 – Tid 10, RU2 – Tid 20) ExecDeviceAssign Detector Frontend Computing Services Readout Systems Filter Systems Event Manager Builder Networks Level 1 Trigger Run Control

17 Ready What ExecSwDownload (e.g.libRU.so, libEVM.so) Local App2 Remote App2 Remote App1 Remote App3 Local App3

18 Operational App2 App1 frameSend (...) App3 DdmSystemChange

19 Efficiency Evolution Roundtrip test, reporting half-roundtrip-time Calculate difference to the bare-bones use of Myrinet GM library JuneJulyAugustSeptemberOctoberNovember original efficiency, paper 450 MHz, PCI 32/33 on-demand buffer-pool allocation 450 MHz, PCI 32/ MHz, PCI 32/33 µsecs time

20 Point To Point Efficiency

21 SOAP CMS Data Acquisition System XML Java I2OI2O I2OI2O O(500) real-time systems Giga E´Net Myrinet, Infiniband 100 kHz 2KB per node Custom readout O(500) builder units O(2000) physics Analysis nodes Prototype cluster 2000: 32 x 32 PCs 2.5 Gbps Myrinet 2000 Gigabit Ethernet

22 Summary Lightweight middleware 2.1  sec per remote function invocation ( calls/s on GM ) –Abstraction from hardware –Ease of adaptability and extensibility is feasibile. Need architectural support –to efficiently integrate layers –to be able to keep pace with technology evolution w/o a need for change –to construct homogeneous applications for heterogeneous processing clusters OS and Device Drivers HTTP Ethernet Myrinet XDAQ Util/DDM Processing Sensor readout TCP PCI

23 Information