Presentation is loading. Please wait.

Presentation is loading. Please wait.

The RTES Project – BTeV, and Beyond Michael J. Haney 1 Shikha Ahuja 2, Ted Bapty 2, Harry Cheung 3, Zbigniew Kalbarczyk 4, Akhilesh Khanna 4, Jim Kowalkowski.

Similar presentations


Presentation on theme: "The RTES Project – BTeV, and Beyond Michael J. Haney 1 Shikha Ahuja 2, Ted Bapty 2, Harry Cheung 3, Zbigniew Kalbarczyk 4, Akhilesh Khanna 4, Jim Kowalkowski."— Presentation transcript:

1 The RTES Project – BTeV, and Beyond Michael J. Haney 1 Shikha Ahuja 2, Ted Bapty 2, Harry Cheung 3, Zbigniew Kalbarczyk 4, Akhilesh Khanna 4, Jim Kowalkowski 3, Derek Messie 5, Daniel Mossé 6, Sandeep Neema 2, Steve Nordstrom 2, Jae Oh 5, Paul Sheldon 7, Shweta Shetty 2, Dmitri Volper 5, Long Wang 4, Di Yao 2 1 High Energy Physics, University of Illinois, 1110 W. Green Street, Urbana, IL 61801 USA 2 Institute for Software Integrated Systems, Vanderbilt University, Nashville, TN 37235 USA 3 Fermi National Accelerator Laboratory, Batavia, IL 60510 USA 4 Electrical and Computer Science, University of Illinois, Urbana, IL 61801 USA 5 Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY 13244 USA 6 Computer Science, University of Pittsburgh, Pittsburgh, PA 15250 USA 7 Physics and Astronomy Department, Vanderbilt University, Nashville, TN 37235 USA

2 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Outline Real Time Embedded System Project –BTeV => RTES Prototypes –SuperComputing 2003 –Demo System 2004 Beyond BTeV

3 M. Haney; RT 2005The RTES Project - BTeV, and Beyond BTeV - High Energy Physics Input: 500 GB/s (2.5 MHz) Level 1 processing: 190  s –rate of 396 ns –528 “8 GHz” G5 CPUs (factor of 50 event reduction) –high performance interconnects Level 2/3 processing: 5+135 ms (factor of 10+2 event reduction) –1536 “12 GHz” CPUs commodity networking Output: 200 MB/s (4 kHz) = 1-2 Petabytes/year

4 M. Haney; RT 2005The RTES Project - BTeV, and Beyond BTeV’s Need “Given the very complex nature of this system where thousands of events are simultaneously and asynchronously cooking, issues of data integrity, robustness, and monitoring are critically important and have the capacity to cripple a design if not dealt with at the outset… BTeV [needs to] supply the necessary level of “self-awareness” in the trigger system.” –[June 2000 Project Review]

5 M. Haney; RT 2005The RTES Project - BTeV, and Beyond thus, RTES The Real Time Embedded System Group –University of Illinois –University of Pittsburgh –University of Syracuse –Vanderbilt University (PI) –Fermilab Physicists and Computer Scientists/Electrical Engineers with expertise in –High performance, real-time system software and hardware, –Reliability and fault tolerance, –System specification, generation, and modeling tools. NSF ITR grant ACI-0121658

6 M. Haney; RT 2005The RTES Project - BTeV, and Beyond The RTES Solution Model Integrated Computing –Graphical representation of complex system, with modeling (simulation) resources ARMORs –To protect Linux processes And sub processors VLAs –To monitor/mitigate at every level embedded, supervisory Linux, Linux trigger farm, etc.

7 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Modeling Environment: GME*  Fault handling  Process dataflow  HW Configuration * GME is an Open-Source, Meta-configurable, multi-aspect graphical modeling tool

8 M. Haney; RT 2005The RTES Project - BTeV, and Beyond ARMOR: Adaptive Reconfigurable Mobile Objects of Reliability Heartbeat ARMOR Detects and recovers FTM failures Fault Tolerant Manager Highest ranking manager in the system Daemons Detect ARMOR crash and hang failures ARMOR processes Provide a hierarchy of error detection and recovery. ARMORS are protected through checkpointing and internal self-checking. Execution ARMOR Oversees application process (e.g. the various Trigger Supervisor/Monitors) Daemon Fault Tolerant Manager (FTM) Daemon Heartbeat ARMOR Daemon Exec ARMOR App Process network

9 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Very Lightweight Agents Minimal footprint Platform independence –Employable everywhere in the system! Monitors hardware and software Handles fault detection & communications with higher level entities Physics Application Hardware OS Kernel (Linux) VLA L2/L3 Manager Nodes (Linux) Physics Application Level 2/3 Farm Nodes (Linux) Network API

10 M. Haney; RT 2005The RTES Project - BTeV, and Beyond RTES view of the BTeV L1 Trigger

11 M. Haney; RT 2005The RTES Project - BTeV, and Beyond SC2003 Prototype Gateway PC - Windows OS DATA DSP - BIOS Physics Application Physics Application Very Light Monitor Agent TCP/IP PC - Linux OS EPICS Graphical Display System TCP/IP COMMANDS ARMOR Microkernel Recovery Policy Msg Parser Local Manager ARMOR DSP Interface Daemon ARMOR Microkernel Recovery Policy Msg Parser Local Manager ARMOR EPICS Interface Daemon

12 M. Haney; RT 2005The RTES Project - BTeV, and Beyond EPICS GUI

13 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Independent Review Following SuperComputing 2003, a software review was conducted –GME needs to coherently address multiple, differing domains System modeling, messaging, fault mitigation, Run Control function, GUI, other –ARMORs need to be easily customized Via GME –Overall packaging and version control - vital

14 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Domain-specific languages GME models, metamodels, and interpreters for –system description, messaging, state machine (run control, ARMOR), GUI Each language generates appropriate artifacts –C++, Python, Matlab M-files, Elvin config files, etc.

15 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Versioning/Build System Run Tree System Executables Build Tree UDM Translators Canonical XML models Domain Models Metamodels Language Specification Domain Artifacts Compiler/ Linker Translator Source Files Models Language Specification Object Source Artifacts OUT IN OUT IN

16 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Demo System 2004 - L2/3 Trigger FTM Global MgrHeartbeat/Source node Regional Mgr 1 Worker 1.1 HB Exec ARMOR Filter 1Filter 2Event Builder Worker 1.2 Regional Mgr 2 Exec ARMOR Worker 2.1 Elvin Router GUI Region 1 Elvin msg ARMOR msg Exec ARMOR Event Source

17 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Demo System 2004 Iron Ganglia public private laptop Matlab Elvin laptop Matlab Elvin Boulder Elvin Global RC, ARMOR Regional RC, ARMOR Worker RC, VLA, ARMOR FilterApp Worker RC, VLA, ARMOR FilterApp … Regional RC, ARMOR Worker RC, VLA, ARMOR FilterApp Worker RC, VLA, ARMOR FilterApp … Regional RC, ARMOR Worker RC, VLA, ARMOR FilterApp … DataSource file reader

18 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Matlab GUI

19 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Beyond BTeV - CMS GME modeling for XDAQ –System descriptions, state machines, messaging… –Work in progress Fault tolerance for HLT –ARMORs and VLAs Being discussed Balancing CMS needs and RTES goals Adding value, without requiring changes

20 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Beyond BTeV - LQCD Lattice Gauge Theory Computation –farm at Fermilab Single-point sensitivities –Single process fault can compromise entire farm computation –Checkpointed; can be restarted, but… ARMORs and VLAs –Batch/autonomous protection No operator –Dynamic mix of protection requirements Not a (quasi)static L2/3 Trigger

21 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Beyond BTeV - Grid, Other Grid Projects –Load balancing and networks studies Nodes-in-farm => farms-in-grid Resource driven, deadline driven, other –Extension of studies done for BTeV/partitioning Other - Dark Energy Survey (astro camera) –“Simple” system (few nodes) –Not real-time hard (can reacquire image) But it will be a good case-study for the “cost” of incorporating RTES (GME, ARMORs, VLAs)

22 M. Haney; RT 2005The RTES Project - BTeV, and Beyond Conclusions The RTES project developed two prototypes (L1, and L2/3) for BTeV –Demonstrated at conferences RTES is now applying its design-time modeling and runtime middleware to several high performance heterogeneous embedded application environments

23 M. Haney; RT 2005The RTES Project - BTeV, and Beyond


Download ppt "The RTES Project – BTeV, and Beyond Michael J. Haney 1 Shikha Ahuja 2, Ted Bapty 2, Harry Cheung 3, Zbigniew Kalbarczyk 4, Akhilesh Khanna 4, Jim Kowalkowski."

Similar presentations


Ads by Google