Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS.

Similar presentations


Presentation on theme: "Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS."— Presentation transcript:

1 Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS Collaboration CERN, 1211 Geneva 23, Switzerland

2 2 The Issue 1988 The biggest problem with creating distributed computing systems is devising a method of intercomputer communication that is reliable, fast and simple. J.E. Tomayko, NASA CR-182505, p.228, Mar 1988 2000 High-speed networks […] can obtain communication speeds close to those of supercomputers, but realizing this potential is a challenging problem. H. Bal, ACM Op Sys Rev, p. 79, Oct 2000

3 3 The Approach invest in alternative communication paradigms optimise communication libraries Do not… Lightweight framework for homogeneous communication Configure with low-level communication libraries Plug-in application components homogeneous subsystem interface design support Provide architectural software support

4 4 Architectural Software Support Architecture support comprises –a processing model –subsystem addressing –configuration and control –Application Programmer Interface requirements Everything that is needed to build and operate a Distributed application

5 5 Motivation In large scale data acquisition systems we have to cope with –Long operational lifetimes (10-15 yrs) –Modifications due to generation jumps (networking, processing) –Deployance of one application in various different environments –Bridging of hardware/software performance gaps From the special case we can extrapolate to general cluster based systems –Search engines, document retrieval systems –Plant control systems –Medical imaging networks in hospitals Available tools don´t match the requirements

6 6 HDM/FPGAHDM/IOP Architecture Basis: I 2 O A specification for hardware and operating system independent device driver framework Targeted at collaboration between... Messaging Layer Host and Intelligent devices Intelligent device intercommunication PCI bus UNIX - OSMWindows - OSM

7 7 I 2 O IOP Environment Inbound/Outbound queue (pass frame pointers, Zcopy) Homogeneous frame format Event driven processing Uniform hardware access API IRQ bar ( ) Network HDM, framework foo( ) Inbound outbound

8 8 I 2 O Message Frame Used to implement an active message model MessageSize MessageFlagsVersionOffset TargetAddressInitiatorAddressFunction (= FFh) InitiatorContext TransactionContext XFunctionCodeOrganizationID PrivatePayload = function parameters PrivatePayload 3210 3124231615870 Standard Frame Private Frame Extension Assigned by application and returned in reply (cookie) Assigned by message layer. Used for routing back reply

9 9 I 2 O Messaging A Message frame contains two addresses –initiatorTid = where the message comes from –destinationTid = to which DDM/ISM it shall go Message is associated with a handler function –Predefined Functions for I 2 O messages –Private frame extension for application specific messages Message length limited to 265 KB. Frame should only contain control information –Message data should go into S catter- G ather L ists I 2 O frame byte order is little endian

10 10 Peer and Peer 2 Peer Operations Peer Operation uses the queue pair on one PCI segment Peer-to-Peer commands for network communication Executive Peer Transport Agent Executive Peer Transport Agent Peer Transport DDM Messaging Layer Executive Messaging Layer Device Driver Module Non-I 2 O messages I 2 O message frames

11 11 I 2 O Peer Operation for Clusters Application component device Processing node IOP Controller node host Homogeneous communication –frameSend for local, remote, host –single addressing scheme (Tid) Application framework Executive Messaging Layer Peer Transport Agent Messaging Layer Executive Peer Transport Agent   ‚  „     Peer Transport Application I 2 O Message Frames

12 12 TargetAddr ClassId Instance Dispatcher Applications are I 2 O Classes in XDAQ they are equivalent to C++ classes Listener DDmAdapter UtilAdapter UserAdapter Application Each class exposes an interface that is implemented by the application

13 13 Polling Peer Transport Agent + low OS service overhead - executive uses CPU continuously - no blocking PTs Peer Transport Configurations PTA TCP Myri DLPI FIFO PTA TCP Myri DLPI FIFO Thread per Peer Transport - higher OS service overhead + no CPU monopolisation + allows integration with other software

14 14 I 2 O for Cluster Configuration executive tasks RUIO (IOP480) VxWorks PPC (MVME2306) VxWorks, Linux Workstation Intel Linux, Sparc Solaris

15 15 Boot Executives on each node in the cluster wait for I 2 O configuration messages Configuration and Control can be done through –Native I 2 O messages –XML/HTTP mapping Zzz.. zzz … zzz.. Parameter set/get is Also done through I 2 O/XML

16 16 I 2 O Configuration Commands Where (e.g. IOP 34) ExecSysTabSet How (e.g TCP, DLPI, Myrinet) Who (e.g RU1 – Tid 10, RU2 – Tid 20) ExecDeviceAssign Detector Frontend Computing Services Readout Systems Filter Systems Event Manager Builder Networks Level 1 Trigger Run Control

17 17 Ready What ExecSwDownload (e.g.libRU.so, libEVM.so) Local App2 Remote App2 Remote App1 Remote App3 Local App3

18 18 Operational App2 App1 frameSend (...) App3 DdmSystemChange

19 19 Efficiency Evolution Roundtrip test, reporting half-roundtrip-time Calculate difference to the bare-bones use of Myrinet GM library JuneJulyAugustSeptemberOctoberNovember 10 5 3 4 2 1 original efficiency, paper 450 MHz, PCI 32/33 on-demand buffer-pool allocation 450 MHz, PCI 32/33 750 MHz, PCI 32/33 µsecs time

20 20 Point To Point Efficiency

21 21 SOAP CMS Data Acquisition System XML Java I2OI2O I2OI2O O(500) real-time systems Giga E´Net Myrinet, Infiniband 100 kHz input @ 2KB per node Custom readout O(500) builder units O(2000) physics Analysis nodes Prototype cluster 2000: 32 x 32 PCs 2.5 Gbps Myrinet 2000 Gigabit Ethernet

22 22 Summary Lightweight middleware 2.1  sec per remote function invocation ( 50 000 calls/s on GM ) –Abstraction from hardware –Ease of adaptability and extensibility is feasibile. Need architectural support –to efficiently integrate layers –to be able to keep pace with technology evolution w/o a need for change –to construct homogeneous applications for heterogeneous processing clusters OS and Device Drivers HTTP Ethernet Myrinet XDAQ Util/DDM Processing Sensor readout TCP PCI

23 23 Information http://cern.ch/xdaq Johannes.Gutleber@cern.ch Luciano.Orsini@cern.ch


Download ppt "Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS."

Similar presentations


Ads by Google