Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thomas Jefferson National Accelerator Facility Page 1 Clas12 Reconstruction and Analysis Framework V. Gyurjyan.

Similar presentations


Presentation on theme: "Thomas Jefferson National Accelerator Facility Page 1 Clas12 Reconstruction and Analysis Framework V. Gyurjyan."— Presentation transcript:

1 Thomas Jefferson National Accelerator Facility Page 1 Clas12 Reconstruction and Analysis Framework V. Gyurjyan

2 Thomas Jefferson National Accelerator Facility Page 2 CLAS Computing Challenges and Complications Growing software complexity Compilation and portability problems Maintenance and debugging difficulties Scaling issues Author volatility Difficulty enforcing standards Drop-out authors OS uniformity problems Difficulty meeting the requirements by collaborating Universities Language uniformity problems Contributions in a defined HL language: tcl, C, C++, Fortran Steep learning curve to operate the software Technology uniformity requirements Users/contributors qualification Coherent documentation for ever growing software applications Software fragmentation Decreased user base Decreased contributions Decreased productivity

3 Thomas Jefferson National Accelerator Facility Page 3 CLAS12 Computing Requirements Challenges of the physics data processing environment Long lifetime, evolving technologies Complexity through simplicity Build complex applications using small and simple components. Enhance utilization, accessibility, contribution and collaboration Reusability of components On-demand data processing. Location independent resource pooling. Software agility Multi-Threading Effective utilization of multicore processor systems. Production Processing Dynamic elasticity. Ability to expand computing power with minimal capital expenditure Utilization of IT resources of collaborating Universities. Take advantage of available commercial computing resources.

4 Thomas Jefferson National Accelerator Facility Page 4 How to Increase Productivity and Physics Output? Increase number of contributors Increase number of users Decrease development time How? Software application design utilizing reusable application building blocks (LEGO model) Keep LEGO modules as simple as reasonable possible. Limit on the number of code lines Relax programming language requirements Separation between SC programmer and physicist, i.e. separation between software programming and an algorithm description. Physicist focuses on a physics algorithm Programmer programmatically implements described physics algorithm. Rapid prototyping, deployment and testing cycles.

5 Thomas Jefferson National Accelerator Facility Page 5 Ideas Adopted From GAUDI Clear separation between data and algorithms Services encapsulate algorithms Services communicate data Three basic types of data: event, detector, statistics Clear separation between persistent and transient data No code or algorithmic solution was borrowed.

6 Thomas Jefferson National Accelerator Facility Page 6 Review: Software Architectures Application Characteristics Traditional OOASOA ModularityThrough OOThrough processes: SaaS CompositionMonolithicMonolithic/Distributed Coupling between modules StrongLoose Language uniformityRequiredNot required OS uniformityRequiredNot required Fault toleranceLess (shared address space and program stack) More (separate processes) Testing and maintenanceDifficultRelatively simple Updates and extensionsDifficultSimple (can be done in steps)

7 Thomas Jefferson National Accelerator Facility Page 7 Review: Large Scale Deployments ArchitectureBatch and GridCloud Traditional OOAYesDifficult SOAYesSimple/native

8 Thomas Jefferson National Accelerator Facility Page 8 Meeting Requirements FunctionalityTraditional Computing Cloud Computing ReusabilityYes Legacy/foreign software component integration HardYes On demand data processingHardYes Location independent resource pooling HardYes Dynamic elasticityHardYes Software agilityHardYes Utilization off-site computing resources HardYes

9 Thomas Jefferson National Accelerator Facility Page 9 Google Trends. Scale is based on the average worldwide traffic of cloud computing in all years. Cloud Computing Grid Computing Advanced Computing Emerging Trends Helix Nebuls project at CERN (02 Mar 2012 V3.co.uk digital publishing company ). CERN has announced it is developing a pan- European cloud computing platform, with two other leading scientific organizations, which will take advantage of the increased computing power cloud systems offer to help scientists better analyze data. Interview with the Frédéric Hemmer, head of CERN's IT department. "CERN's computing capacity needs to keep up with the enormous amount of data coming from the Large Hadron Collider (LHC) and we see Helix Nebula, the Science Cloud, as a great way of working with industry to meet this challenge.” Magellan project at US Department of Energy. USDE will spend $32 million on a project that will deploy a large cloud computing test bed with thousands of Intel Nehalem CPU cores and explore the work of commercial cloud offerings from Amazon, Microsoft and Google for scientific purposes.

10 Thomas Jefferson National Accelerator Facility Page 10 SOA Defined SOA is a way of designing, developing, deploying, and managing software systems characterized by coarse-grained services that represent reusable functionality. Service consumers compose applications or systems using the functionality provided by these services through standard interfaces. An architecture (NOT a technology) based on well defined, reusable components: services. Services are loosely coupled. Services with functional context that are agnostic to a composed application logic. Agnostic services participate in multiple algorithmic compositions. Application design based on available services.

11 Thomas Jefferson National Accelerator Facility Page 11 Computing Model and Architecture Choice SOA is a key architecture choice for the ClaRA Data processing major components as services Multilingual support –Services can be written in C++, Java and Python Physics application design/composition based on services Supports both traditional and cloud computing models Single process as well as distributed application design modes Centralized batch processing Distributed cloud processing Multi-Threaded

12 Thomas Jefferson National Accelerator Facility Page 12 Attributes of ClaRA Services Communicate data through shared memory and /or pub-sub messaging system Well defined, easy-to-use, data-centric interface Self-contained with no dependencies on other services Loose coupling. Coupled through communicating data. Always available but idle until requests arrival Location transparent Services are defined by unique names, and are discovered through discovery services Combine existing services into composite services or applications Services can be combined in a single process (run-time environment), communicating data through shared memory (traditional computing model).

13 Thomas Jefferson National Accelerator Facility Page 13 More on ClaRA Environment User centric interface ClaRA services are accessed with simple and pervasive methods ClaRA interface tries not to force users to change their programming habits, e.g. programming language, compiler and OS. Clients to ClaRA applications or clouds that are required to be installed outside of the ClaRA environment are thin and light-weighted. On-demand service provisioning ClaRA resources and services are available for users on demand. Users can customize, configure and control services at every request.

14 Thomas Jefferson National Accelerator Facility Page 14 Clas12 Physics Data Processing Database Geometry, Calibration and MFM Services Geometry, Calibration and MFM Services Raw Data Reconstruction Services Reconstruction Services DST Tuples DST Tuples DST and Tuple Generation Services DST and Tuple Generation Services Simulation Sim Data Physics Analysis Services Physics Analysis Services Monitors and Displays

15 Thomas Jefferson National Accelerator Facility Page 15 Programming Paradigm algorithmdata

16 Thomas Jefferson National Accelerator Facility Page 16 ClaRA Design Architecture PDP Service Bus (pub-sub and/or shared-memory) Service layer Orchestration Layer Rule invocation Identification Filtration Subscription Control Layer Data flow control Load balancing Error recovery Managing errors and exceptions Security, validation and auditing Administration SaaS, IaaS, DaaS Registration Service Service Inventory

17 Thomas Jefferson National Accelerator Facility Page 17 ClaRA Containers Cloud Controller Service Bus (pub-sub server) Registration Discovery Administration Governing Service Bus pub-sub server SaaS Administration Service deployment Service removal Service recovery Monitoring CPU (Unix only) Java Container Main Container Administration Service deployment Service removal Service recovery Monitoring SaaS C++ Container Container is a process Single Java container Single C++ container Computing Node 1 Cloud Control Node Computing Node 1

18 Thomas Jefferson National Accelerator Facility Page 18 ClaRA Cloud Formation Transient Data Storage Service Bus Service 1 Service 2 Service N Service 1 Service 2 Service N Java Container C++ Container Computing Node 1 Service Bus Computing Node 2 Service Bus Computing Node 1 Service Bus Computing Node N

19 Thomas Jefferson National Accelerator Facility Page 19 ClaRA Cloud Interactions Cloud Control Node Node 1Node 2Node 3Node N Cloud 1 Cloud Control Node Node 1Node 2Node 3Node N Cloud 2 Cloud Control Node Node 1Node 2Node 3Node N Cloud N

20 Thomas Jefferson National Accelerator Facility Page 20 ClaRA SaaS Implementation User Software (shared object) Engine interface Message processing

21 Thomas Jefferson National Accelerator Facility Page 21 Service Coupling A functional service context is independent from the outside logic. Contract-To-Functional coupling between services. Service has a name. Service receives data and sends data. Consumer-To-Contract coupling between consumers and services. User/consumer sends data to the named service. Sending the data will trigger execution of the receivers service. Service A Service A

22 Thomas Jefferson National Accelerator Facility Page 22 Interface. Transient Data Envelope Data Object Meta DataType MimeTypeString/Integer DescriptionString VersionInteger UnitString GovernTypeAccess StatusStringR/W ControlProperty ListR only CustomerStringR only

23 Thomas Jefferson National Accelerator Facility Page 23 Service abstraction Technology information (hidden) Programmatic logic information (hidden) Functional information (exposed through service contract meta data) Quality of service information (available from the platform registration services)

24 Thomas Jefferson National Accelerator Facility Page 24 Service Types Entity services Generic, reusable Utility services Legacy systems Composite services Primitive compositions –Self governing service compositions, task services Complex compositions –Controlled aggregate services

25 Thomas Jefferson National Accelerator Facility Page 25 Service Composition Primitive composition Represents message exchanges across two or more services. Requires composition initiator. Complex composition Orchestrated task services seen as a single service. Orchestration Layer

26 Thomas Jefferson National Accelerator Facility Page 26 Service Discovery Service Consumer Service Consumer Service Provider Register service Exchange messages Discover service Service Registry Service Registry

27 Thomas Jefferson National Accelerator Facility Page 27 Multi-Threading Only one process is active on the ClaRA cloud node. Single Java container (JVM) on a node. Single C++ container on a node. Service containers run in their own threads. A service container executes contained service engines in separate threads. A service engine must be thread safe if it is planned to run in a multi- threaded mode. Advantages Small memory footprint, lower probability for missing a L1/L2.. caches, less RAM access, better performance. Multi-Processing bookkeeping complexity. Users must keep track of input/output data distribution and data reconsolidation.

28 Thomas Jefferson National Accelerator Facility Page 28 O S S S S S O S S S S S Orchestrator EventService O S S S S S O S S S S S O S S S S S O S S S S S Node 1 Node 2 Node N

29 Thomas Jefferson National Accelerator Facility Page 29 Service deployment graphical Interfaces

30 Thomas Jefferson National Accelerator Facility Page 30 Clas12 Detector Electronics Clas12 Detector Electronics Trigger Slow Controls ET Online Transient Data Storage ET Online Transient Data Storage Permanent Data Storage Permanent Data Storage Online EB Services Online EB Services Online Monitoring Services Online Monitoring Services Event Visualization Services Event Visualization Services Online Calibration Services Online Calibration Services Calibration Database Calibration Database Conditions Database Conditions Database Geometry Calibration Services Geometry Calibration Services Run Conditions Services Run Conditions Services Cloud Control Service Registration Service Control Online Cloud Online Application Orchestrator Online Application Orchestrator Cloud Control Service Registration Service Control Physics Data Processing Application Orchestrator Physics Data Processing Application Orchestrator Geant 4 GEMC Simulation GEMC Simulation EB Services EB Services DST Histogram Visualization Services DST Histogram Visualization Services Analysis Services Analysis Services Calibration Services Calibration Services Run Conditions Services Run Conditions Services Geometry Calibration Services Permanent Data Storage Permanent Data Storage Permanent Data Storage Permanent Data Storage Service Control Service Registration Cloud Control Permanent Data Storage Permanent Data Storage Calibration Database Calibration Database Conditions Database Conditions Database Service Control Service Registration Cloud Control Permanent Data Storage Permanent Data Storage Calibration Database Calibration Database Conditions Database Conditions Database Offline University Cloud 1 Offline University Cloud n Cloud Scheduler Offline JLAB Cloud

31 Thomas Jefferson National Accelerator Facility Page 31 DCClusterFinder DCRoadsInClusterFinder DCRegionSegmentFinder DCTrackCandidateFinder FMTHitsToTrackSegment ForwardKalmanFilter Transient Data Storage. Shared memory BSMTHitFilter BSMTMergeHits PointHitFinder HoughTransform HelixTrackEstimator TrackFitter Transient Data Storage. Shared memory ECStripsCreator ECHitsFinder ForwardTOFReconstructio n Shared memory ParticleIdentification FTOF PID Calorimeter Central TrackingForward Tracking Permanent Storage Event Reconstruction Application Orchestrator C++ VM Node 1 Node n JVM Event Reconstruction Database Services Database Services pub-sub

32 Thomas Jefferson National Accelerator Facility Page 32

33 Thomas Jefferson National Accelerator Facility Page 33

34 Thomas Jefferson National Accelerator Facility Page 34 Tracking in a Single Cloud Node

35 Thomas Jefferson National Accelerator Facility Page 35 Tracking in a 17 Node RU Cloud

36 Thomas Jefferson National Accelerator Facility Page 36 Challenges Communication latency careful analysis of the criticality of a service, introduce built-in tolerance for variations in network service response times Data integration and distribution Increases author and user pools. Management and administration. Strict service canonization rules Workloads of different clients may introduce “pileups” on a single service. Service and Cloud governance Network security Client authentication and message encryption

37 Thomas Jefferson National Accelerator Facility Page 37 Summary and Conclusion A multi-treaded analyses framework, based on SOA PDP application based on specialized services Small, independent Easy to test, update and maintain Building and running PDP application does not require CS skills. List of applications has been developed using the framework Charge particle tracking (central, forward) EC reconstruction FTOF reconstruction PID HTCC PCAL Detector calibration Event Building Histogram services Database application –Geometry, calibration constants and run conditions services ClaRA supports both traditional and cloud computing models and if need be we are ready for cloud deployment.

38 Thomas Jefferson National Accelerator Facility Page 38 Service Bus Performance measurements Control messages only <0.03ms/message

39 Thomas Jefferson National Accelerator Facility Page 39 Service Bus Performance measurements

40 Thomas Jefferson National Accelerator Facility Page 40 Event Reconstruction Profiling Componentms / event, thread, trackinfo Track11.2 (0.03ms/MPI * 5 =0.15ms) measured Vertex5estimate EC3.1 (0.03ms/MPI * 2 =0.06ms) measured PCAL3estimate HTCC, LTCC, TOF10estimate Total32.3estimate Extrapolation of the trajectory from hit to hit, along with the uncertainty matrix of track parameters At each hit we decrease uncertainty and modify track parameters Extrapolation of the magnetic field. 3 kalman filter iterations end-start, start-end, end-vertex Magnetic field extrapolation step size 2cm in z and r, every 0.25degree MPIPerformance Node0.03ms Network 10K data0.1ms Network >100K data100MB/s 1Gigabit Ethernet

41 Thomas Jefferson National Accelerator Facility Page 41 Grid vs Cloud Cloud computing overlaps with many existing technologies, such as Grid, Utility, and distributed computing in general. The problem are mostly the same. There is a common need to be able to manage large computing resources to perform large scale data processing.

42 Thomas Jefferson National Accelerator Facility Page 42 Grid vs Cloud Service Oriented By Ian Foster, et.all

43 Thomas Jefferson National Accelerator Facility Page 43 How different is Cloud? It is elastically scalable Can be encapsulated as an abstract entity that delivers different levels of services to users outside the cloud. Services can be dynamically configured and delivered on demand. Major difference compared to Grid Resources in the Cloud are being shared by all users at the same time in contrast to dedicated resources governed by a queuing system: batch MPI (message passing interface) is used to achieve intra-service communication. Services are loosely coupled, transaction oriented, and are always interactive as appose to batch-scheduled as they are currently in Grids.

44 Thomas Jefferson National Accelerator Facility Page 44 Collaboration wide Utilization and Contribution Application Users Service Programmers Framework developers Users manual Graphical interface client bundles Application design graphical tool Service discovery and monitoring tool Wiki how-to Workshops and conferences ClaRA Developers manual Svn JavaDoc, Doxigen Wiki forums Bug-tracking tools ClaRA developers blog Service deployment and management tools. Release notes Discussion forums Maintenance and request tracking

45 Thomas Jefferson National Accelerator Facility Page 45 Transitioning from Development to Operations Many test deployments at local and remote desktops and laptops. ClaRA operational platforms at RU, ODU and JLAB. Deployment of the ClaRA platform on the batch queue system at the JLAB is in the progress with the help of the IT department.

46 Thomas Jefferson National Accelerator Facility Page 46 Java Service Container C++ Service Container Service Bus C++ Service Container Java Service Container Event Services Event Services Cloud Control Service Registration Service Control ClaRA Cloud Control Node Clas12 private resource PDP Application Orchestrator PDP Application Orchestrator Run Conditions Services Run Conditions Services Geometry Calibration Services Permanent Data Storage Permanent Data Storage Calibration Database Calibration Database Conditions Database Conditions Database Service Bus Farm Nod 2 Farm Nod 1 ClaRA Deployment on the JLAB Batch Farm

47 Thomas Jefferson National Accelerator Facility Page 47 Possible Questions Grid vs Cloud Resources in the Cloud can be dynamically allocated and released (elasticity), as appose to Grid where fixed, dedicated resources are governed by a batch queuing system. Services are being shared by all users at the same time, they are transaction oriented, and are always interactive, as appose to batch-scheduled as they are currently in Grids. In the cloud Services can be dynamically configured and delivered on demand. How ClaRA provides cloud computing environment Software application can be presented as a service (Cloud SaaS support) IaaS (infrastructure as a service), i.e. registration, discovery, management and administration services Multithreading vs multi-processing Small memory footprint, lower probability for missing multilevel core/chip cache, less RAM access, better performance. Multi-Processing bookkeeping complexity.

48 Thomas Jefferson National Accelerator Facility Page 48 Possible Questions Service communication latency Messages < 1K transferred within the same node – 0.03ms and over the 1GBit network – 0.043ms Messages with 10K in the same node – 0.05ms and over the 1GBit network – 0.1ms Messages over 100K over the 1GBit network – 100MByte/s What you mean under loose coupling? Well defined, data-centric interface Self-contained with no dependencies on other services Coupled through communicating data (transient data envelope) Errors propagation and fault tolerance Status field of the transient data envelope What communication protocol is used between services? Communicate data through shared memory and /or pub-sub messaging system

49 Thomas Jefferson National Accelerator Facility Page 49 Possible Questions What you mean under location transparent resource allocation Service is defined by a unique name, and is discovered through the discovery services of the ClaRA platform, without a knowledge where this service is physically deployed and running. What you mean on-demand data processing ClaRA resources and services have temporal continuity and are available for users on demand. Users can customize, configure and control services at every request. How easy it would be to deploy ClaRA on a commercial cloud? –ClaRA platform is written in pure Java, making IaaS (infrastructure as a service) easy deployable and portable. C++ container is self contained and ported on many flavors of Unix and mac-OS.


Download ppt "Thomas Jefferson National Accelerator Facility Page 1 Clas12 Reconstruction and Analysis Framework V. Gyurjyan."

Similar presentations


Ads by Google