Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thomas Jefferson National Accelerator Facility Page 1 12 GeV Upgrade Software Review Jefferson Lab November 25-26, 2013 Software Project for Hall B 12.

Similar presentations


Presentation on theme: "Thomas Jefferson National Accelerator Facility Page 1 12 GeV Upgrade Software Review Jefferson Lab November 25-26, 2013 Software Project for Hall B 12."— Presentation transcript:

1 Thomas Jefferson National Accelerator Facility Page 1 12 GeV Upgrade Software Review Jefferson Lab November 25-26, 2013 Software Project for Hall B 12 GeV Upgrade Status and Plans D. P. Weygand

2 Thomas Jefferson National Accelerator Facility Page 2 Overview CLAS12 Software Overview Advances made in: 1a –Framework –Simulations –Tracking –Calibration and Monitoring –Event Reconstruction –Data Processing Timelines and Milestones1a/b –Milestones met and current timeline Useability1c –Of framework (Data Mining Project), simulation (detector studies), and reconstruction (detector performance analysis) Software Profiling and Documentation1d Steps taken to address recommendations from previous review2a Risk Mitigation 2d Summary

3 Thomas Jefferson National Accelerator Facility Page 3 CLAS12 Offline Software COMPONENTSDESCRIPTION GEMCFull-Geant4 based detector simulation cedEvent-level visualization CLaRASOA-based Physics Data Processing application development framework DPEPhysics Data Processing Environment in Java, C++, and Python Service ContainersMulti-threaded support Application OrchestratorsCloud/ batch-farm support Online Level-3 Data ProcessingClaRA-based application deployment and operation on the online farm Reconstruction Services TrackingCharged-particle track reconstruction FTOF/CTOFTime of flight reconstruction for PID EC/PCALNeutral particle identification LTCC/HTCCK/pi separation Forward TaggerQuasi-real photon tagger PIDParticle identification Event BuilderEntire event reconstruction Calibration and Monitoring ServicesDetector calibration, monitoring of data processing, histogramming Auxiliary ServicesGeometry service, magnetic field service Calibration and Conditions DatabaseCalibration and conditions database for on/offline constants Data Analysis DSTData summary tapes, data format for analysis Data MiningDistributed data access

4 Thomas Jefferson National Accelerator Facility Page 4 Computing Model and Architecture ClaRA (Clas12 Reconstruction & Analysis framework) is a multi-threaded analyses framework, based on a Service Oriented Architecture Physics application design/composition based on services Services being developed1a –Charge particle tracking (central, forward) –EC reconstruction –TOF reconstruction –PID –HTCC –PCAL –Detector calibration –Event Builder –Histogram services –Database application Multilingual support2b –Services can be written in C++, Java and Python Supports both traditional and cloud computing models 2f Single process as well as distributed application design modes Centralized batch processing Distributed cloud processing

5 Thomas Jefferson National Accelerator Facility Page 5 Clas12 Event Reconstruction R TOF PID HTCC CT EC EB LTCC PCAL FT KF W R/W: Reader/Writer CT: Central Tracking FT: Forward Tracking KF: Kalman Filter EC: Electromagnetic Calorimeter PCAL: PreShower Calorimeter TOF: Forward & Central Time-Of-Flight HTCC/LTCC: Threshold Cerenkov EB: Event Builder PID: Particle ID

6 Thomas Jefferson National Accelerator Facility Page 6 Stress Tests Test using single data streams show that we can do online analysis to be able to analyze ~ 10% data-stream (2kHz) Test using multiple data-stream application scales with number of processing nodes (20 nodes used) 1b 2a 2f

7 Thomas Jefferson National Accelerator Facility Page 7 R R W W Administrative Services Administrative Services ClaRA Master DPE AO Executive Node Farm Node N S1 S2 Sn S1 S2 Sn Multiple Data-stream Application Persistent Storage Persistent Storage DS Persistent Storage Persistent Storage

8 Thomas Jefferson National Accelerator Facility Page 8 Multiple Data-stream Application Clas12 Reconstruction: JLAB batch farm

9 Thomas Jefferson National Accelerator Facility Page 9 ClaRA Batch Processing WorkFlow DPE: Data Processing Environment SM: Shared Memory Dm: Data Manager Tm: Tape Manager I/O: I/O Services S:Reconstruction Services 2f 3a/b

10 Thomas Jefferson National Accelerator Facility Page 10 Jlab Workflow System COMD: E INPUT: F COMD: C INPUT: D Workflow System script: A INPUT : B script: C INPUT:D * CLARA orchestrator Auger PBS Data Request Workflow AWorkflow C Status Report DPE Jobs 1d

11 Thomas Jefferson National Accelerator Facility Page 11 Advances within the Framework Batch Farm processing mode Currently being integrated into large-scale workflow2f 1d Service-based data-flow control application ClaRA Application Designer1c Graphical service composition Transient data streaming optimization EvIO 4.1 is the default event format for CLAS12. o Data is just a byte buffer (avoid serialization). o Complete API (Java, C++,Python) to get data from the buffer. o A set of wrappers to work with the common CLAS12 bank format. Development of EvIO (persistent-transient ↔ transient-persistent) data converter services (i.e. EvIO to ROOT)

12 Thomas Jefferson National Accelerator Facility Page 12 GEMC 2.0 GEANT 4 MONTE CARLO 12 ➡ Automatic “true info”,V(T) signal, digitization ➡ FADC ready ➡ New banks, banks IO, automatic ROOT ➡ Geometry based on Geometry Service ➡ GEMC App: simplify installation GEMC 2.0 1c

13 Thomas Jefferson National Accelerator Facility Page 13 Introducing factories of factories 13 MYSQL TEXTGDML CLARA (service library plugin)

14 Thomas Jefferson National Accelerator Facility Page 14 GEMC Voltage Signal 14 Each step produces a V signal based on DB parameters All signal are summed into a final V(t) shape Negligible effect on performance Example of 2.8 GeV/c  producing digital signal (FTOF 1a & 1b)

15 Thomas Jefferson National Accelerator Facility Page 15 a)Generation 3 tracking (TRAC) i.Written as Java service within ClaRA ii.New design, algorithms & improved efficiency iii.Ongoing code validation processes iv.Used to analyze cosmic & test stand data & validate detector design changes b)New Systems included in Reconstruction Chain i.CTOF & FTOF Reconstruction written as Java services within ClaRA ii.FTCal Reconstruction written as Java service (ongoing standalone development) c)Geometry Service i.TOF & DC get geometry constants from Geometry Service (other detector systems to be included soon) ii.Simulation gets geometry constants from Geometry Service d)Event Builder i.Links outputs of services connected together in reconstruction application ii.Output banks structure designed for output to ROOT Advances in Analysis Software 1a

16 Thomas Jefferson National Accelerator Facility Page 16 1.Advances in Monitoring a)Event Displays i.Displays output of reconstruction as events are being processed (control room displays for detector monitoring) ii.Does statistical histogramming of reconstruction output b)Event histogramming services i.Occupancy plots ii.Detector noise analysis, etc… c)Service & Application Monitoring i.Error handling ii.Incident monitoring 2. Advances in Calibration a)Calibration Development i.TOF & EC Systems (using ROOT input) ii.DC will use legacy algorithms and ROOT input Advances in Monitoring & Calibration 1a

17 Thomas Jefferson National Accelerator Facility Page 17 Previous Timeline/Milestones

18 Thomas Jefferson National Accelerator Facility Page 18 New Timeline goes here…

19 Thomas Jefferson National Accelerator Facility Page 19 Software Profiling and Verification Hot Spot Analysis

20 Thomas Jefferson National Accelerator Facility Page 20 Software Profiling and Verification tracking studies magnetic field studies TRAC Swimmer GEMC tracking Bx By Bz

21 Thomas Jefferson National Accelerator Facility Page 21 Documentation/Workshops 3 Software Workshops September 2012 –What Will Be Covered The CLARA service-oriented framework Service design and implementation EVIO transient data/event format October 2012 –We will walk through the steps needed to do the following: running gemc on the CUE machines to get digitized version of the generated events setup and run reconstruction code on a personal computer/laptop (Java required) or on the CUE machines visualize and perform some simple analysis on the output data

22 Thomas Jefferson National Accelerator Facility Page 22 Docmentation/Workshops February 2013

23 Thomas Jefferson National Accelerator Facility Page 23 CLAS12 Data Distribution/Workflow Tool Tagged File System: Tagged file system is needed to sort run files according to run conditions, targets, beam type and energy. Ability to add meta-data and run properties (such as beam current and total charge) to the run collections. CLARA Distributed Environment: Grid-alternative computing environment for accessing the data from anywhere and for pre-processing data to decrease network traffic. Multi-node Dynamic Process Environments (DPE) for local network distributed computing and data synchronization. Data Mining Analysis Framework: Distributed experimental data framework, with search and discovery using TagFS. Analysis codes for specific experimental data set, including particle identification, momentum corrections and fiducial cuts.

24 Thomas Jefferson National Accelerator Facility Page 24 Management Software workshops before scheduled CLAS collaboration meetings Weekly software meetings (video-conferencing) Mantis (bug reporting) ClaRA framework supports multiple languages, to accommodate/encourage user contributions. Calibration and Commissioning committee is a collaboration wide body that assumes responsibility to oversee the Clas12 software and computing activities. Software upgrades/modification as well as bug fixes are discussed using Mantis and e-mail list. Internal JLab reviews (for e.g. tracking algorithm discussions with the Hall- D group) Milestone changes to address critical issues: eg. Data transfer through a shared memory Minimize EvIO serialization/deserialization 2b-f

25 Thomas Jefferson National Accelerator Facility Page 25 Addressing Previous Recommendations Stress tests Linear scaling with the number of cores 50 node test in progress Useability (see break-out sessions) Data-mining project uses ClaRA to ship data and run analyses at universities all over the world Simulation well advanced and used in proposals Generation-3 Tracking rebuilt and started to be used by detector groups EvIO to ROOT converter C++ service development 2a

26 Thomas Jefferson National Accelerator Facility Page 26 Addressing Previous Recommendations A series of scaling tests ramping up using the LQCD farm should be planned and undertaken. A series of tests were run on the current batch farm (up to 32 hyper-threaded cores) to confirm ClaRA scaling and system robustness. Currently ramping to …cores. Full stress test planned for…. Seriously consider using ROOT as the file format in order to make use of the steady advances in its I/O capabilities. Considered. ROOT data convertor being developed, particularly for calibration services. That is, persistent data remains EVIO, but ROOT is an available file format. The costs and sustainability of supporting two languages, relative to the advantages, should be regularly assessed as the community of users grows, code development practices become clearer, the framework matures further, etc Service language was chosen based on requirements. In fact a third language was added, python – specifically for the PWA analysis service (SciPy Fitter faster) Multi-lingual support has increased availability of programmers - eg ROOT based calibration services. The Geometry Service needed to be written in C++ for GEMC compatibility. 2a

27 Thomas Jefferson National Accelerator Facility Page 27 Risks and Mitigation Communication latency Has been resolved by introducing inter-node deployment of services with shared memory and data caching in memory. Broad author and user pools Proper management and administration; strict service canonization rules Workloads of different clients may introduce “pileups” on a single service Service and Cloud governance (e.g. service locking) Network security Client authentication and message encryption Limited manpower Interfaces (C++/Java) provide access to CLAS legacy code Root data interface broadens programmer base for calibration code 2d

28 Thomas Jefferson National Accelerator Facility Page 28 Summary 1a) Is Hall B making appropriate progress in developing simulation, calibration and analysis software? Yes. Simulation is in an advanced state since it was needed to validate detector design and performance. All detector subsystems are modeled, background simulation is realistic, geometry is aligned with reconstruction through a common service interface, and, finally, the package is easy to use. Calibration is at the advanced design stage, appropriate since it is the last element needed in the overall software chain. Hall B has an advantageous situation, in that the detector subsystems are well-understood by the subsystem managers, being very similar or in some cases, identical, to systems used in the previous CLAS detector. Analysis software has been designed from the bottom up, and the event reconstruction software written and tested for major subsystems: time of flight, calorimetry and charged particle tracking. Close cooperation among the core group of developers has produced a well-designed framework with a similar "look and feel" between the different systems, which should ease the tasks of debugging, maintenance and improvements over the years. Higher level analysis (event selection and binning, fiducial cuts, kinematic fitting, invariant mass reconstruction, etc.) has only just begun, but the core group are providing convenient tools for collaborative effort as demonstrated by some of the outside groups.

29 Thomas Jefferson National Accelerator Facility Page 29 Summary cont Meeting previous milestones? Yes. In a few cases we have re-prioritized effort (for example, placing more emphasis on basics such as event format, object model definition, production of the core geometry and data-base services, while delaying the detailed instantiation of the individual detector calibrations which will be the last step in fine-tuned event reconstruction. Are the milestones adequate and clearly-defined? Yes Is Hall B effectively utilizing collaboration manpower? The majority of work in the past year (framework development, and writing of core services) has been done largely by the core group. However, some of that core group are located remotely, demonstrating that this is not a hindrance to close collaboration. In addition, the software team has made a significant effort to engage the collaboration by holding a number of "hands-on" workshops, and by encouraging subsystem software groups to build their calibration GUI's on a ROOT-based framework. This should provide a sizeable group of people to work on the details of calibration over the next two years. Collaboration CCDB shared between Halls B&D, EVIO developed by DAQ, used as persistent and transient data, Farm Workflow, developed by Scientific Computing, in collaboration with Hall B, GEMC (B&D), Event Display (B&D), RootSpy (B&D), Tracking & Reconstruction algorithms (B&D)

30 Thomas Jefferson National Accelerator Facility Page 30

31 Thomas Jefferson National Accelerator Facility Page 31 Summary ClaRA has advanced considerably in the past year through actual deployment GEMC GEMC integrated with Geometry database Several interactive workshops held on both ClaRA service development and ClaRA deployment to introduce the environment to the collaboration ClaRA deployments and reconstruction chains implemented on a variety of collaboration farms. Steady development of requisite services, in particular generation III tracking with Kalman Filter Initial work on some calibration and monitoring services Initial work on Histogramming/Statistical services Initial work on service profiling and verification

32 Thomas Jefferson National Accelerator Facility Page 32 User SOA Application Designer

33 Thomas Jefferson National Accelerator Facility Page 33

34 Thomas Jefferson National Accelerator Facility Page 34 ClaRA Components DPE C S C S Orchestrator Platform Cloud Control Node Computing Node 1  Each node acts as a DPE.  All services are deployed and executed by threads inside the DPE process.  Global memory to share data between services. Designs and controls ClaRA applications Coordinate services execution and data flow. Usually run outside of the DPE. Deploy services to DPEs Link services together output of a service sent as input to its linked service.

35 Thomas Jefferson National Accelerator Facility Page 35 Service Container Group and manage services in a DPE Can be used as namespaces to separate services. o The same service engine can be deployed in different containers in the same DPE. Handle service execution and its output. Service container presents a user engine as an SOA service (SaaS implementation). Engine interface Message processing Service Engine Service Engine The fundamental unit of ClaRA based application. Receives an input data in an envelope, and generates output data. o The data envelope is the same for all services. Implements ClaRA standard interface o A configure method o An execute method. o Several description/identification methods. Must be thread-safe. o The same service engine can be executed in parallel multiple times.

36 Thomas Jefferson National Accelerator Facility Page 36 Service Communication Transient Data Storage Service Bus Service 1 Service 2 Service N Service 1 Service 2 Service N Java DPE C++ DPE Computing Node 1 Service Bus Computing Node 2 Service Bus Computing Node 1 Service Bus Computing Node N

37 Thomas Jefferson National Accelerator Facility Page 37 Transient Data Envelope

38 Thomas Jefferson National Accelerator Facility Page 38 R R W W Administrative Services Administrative Services ClaRA Master DPE Persistent Storage Persistent Storage AO Executive Node Farm Node N S1 S2 Sn S1 S2 Sn Single Data-stream Application

39 Thomas Jefferson National Accelerator Facility Page 39 R R W W Administrative Services Administrative Services ClaRA Master DPE AO Executive Node Farm Node N S1 S2 Sn S1 S2 Sn Multiple Data-stream Application Persistent Storage Persistent Storage DS Persistent Storage Persistent Storage

40 Thomas Jefferson National Accelerator Facility Page 40 Application Graphical Designer

41 Thomas Jefferson National Accelerator Facility Page 41 Computing Model Clas12 Detector Electronics Clas12 Detector Electronics Trigger Slow Controls ET Online Transient Data Storage ET Online Transient Data Storage Permanent Data Storage Permanent Data Storage Online EB Services Online EB Services Online Monitoring Services Online Monitoring Services Event Visualization Services Event Visualization Services Online Calibration Services Online Calibration Services Calibration Database Calibration Database Conditions Database Conditions Database Geometry Calibration Services Geometry Calibration Services Run Conditions Services Run Conditions Services Cloud Control Service Registration Service Control Online Farm Online Application Orchestrator Online Application Orchestrator Cloud Control Service Registration Service Control Physics Data Processing Application Orchestrator Physics Data Processing Application Orchestrator Geant 4 GEMC Simulation GEMC Simulation EB Services EB Services DST Histogram Visualization Services DST Histogram Visualization Services Analysis Services Analysis Services Calibration Services Calibration Services Run Conditions Services Run Conditions Services Geometry Calibration Services Permanent Data Storage Permanent Data Storage Permanent Data Storage Permanent Data Storage Service Control Service Registration Cloud Control Permanent Data Storage Permanent Data Storage Calibration Database Calibration Database Conditions Database Conditions Database Service Control Service Registration Cloud Control Permanent Data Storage Permanent Data Storage Calibration Database Calibration Database Conditions Database Conditions Database Offline University Cloud 1 Offline University Cloud n Cloud Scheduler Offline JLAB Farm

42 Thomas Jefferson National Accelerator Facility Page 42 Read EVIO events from input file. Events pass from service to service in the chain. o Services add more banks to the event. Write events to output file. RS1S2SNW Single Event Reconstruction

43 Thomas Jefferson National Accelerator Facility Page 43 Multi-Core Reconstruction R S1S2SN W S1S2SN S1S2SN O DPE

44 Thomas Jefferson National Accelerator Facility Page 44 Multi-Core Reconstruction

45 Thomas Jefferson National Accelerator Facility Page 45 Multi-Core Reconstruction

46 Thomas Jefferson National Accelerator Facility Page 46 Multi-Core Reconstruction

47 Thomas Jefferson National Accelerator Facility Page 47 Multi-Node Reconstruction R S1SN DO S2 S1SNS2 S1SNS2 DPEn S1SNS2 S1SNS2 S1SNS2 DPE2 S1SNS2 S1SNS2 S1SNS2 DPE1 W DPEio MO

48 Thomas Jefferson National Accelerator Facility Page 48 Multi-Node Reconstruction

49 Thomas Jefferson National Accelerator Facility Page 49 Batch Deployment

50 Thomas Jefferson National Accelerator Facility Page 50 Single Data-stream Application Clas12 Reconstruction: JLAB batch farm

51 Thomas Jefferson National Accelerator Facility Page 51 Multiple Data-stream Application Clas12 Reconstruction: JLAB batch farm

52 Thomas Jefferson National Accelerator Facility Page 52 Previous Timeline/Milestones 1b

53 Thomas Jefferson National Accelerator Facility Page 53 Single Data-stream Application


Download ppt "Thomas Jefferson National Accelerator Facility Page 1 12 GeV Upgrade Software Review Jefferson Lab November 25-26, 2013 Software Project for Hall B 12."

Similar presentations


Ads by Google