Thomas Jefferson National Accelerator Facility Page 1 CLAS12 Software D.P. Weygand Thomas Jefferson National Accelerator Facility.

Slides:



Advertisements
Similar presentations
Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Advertisements

Thomas Jefferson National Accelerator Facility Page 1 Internal Software Review Jefferson Lab November 6-7, 2014 Hall B:User Software Contributions Gerard.
Thomas Jefferson National Accelerator Facility Page 1 CLAS12 Software User Environment Introduction: Software tasks, users, projects. Tools. Simulation.
Towards Target-Level Testing and Debugging Tools For Embedded Software Harry Koehnemann, Arizona State University Dr. Timothy Lindquist, Arizona State.
16/13/2015 3:30 AM6/13/2015 3:30 AM6/13/2015 3:30 AMIntroduction to Software Development What is a computer? A computer system contains: Central Processing.
Chapter 1 and 2 Computer System and Operating System Overview
WSN Simulation Template for OMNeT++
Chapter 1 and 2 Computer System and Operating System Overview
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
Operating System Organization
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Chapter 2 Operating System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Computer System Architectures Computer System Software
Operating System A program that controls the execution of application programs An interface between applications and hardware 1.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Christopher Jeffers August 2012
University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 2: System Structures.
IT in the 12 GeV Era Roy Whitney, CIO May 31, 2013 Jefferson Lab User Group Annual Meeting.
CLAS12 CalCom Activity CLAS Collaboration Meeting, March 6 th 2014.
Thomas Jefferson National Accelerator Facility Page 1 Clas12 Reconstruction and Analysis Framework V. Gyurjyan S. Mancilla.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
An Introduction to Software Architecture
The GlueX Collaboration Meeting October 4-6, 2012 Jefferson Lab Curtis Meyer.
Coupling and Cohesion Pfleeger, S., Software Engineering Theory and Practice. Prentice Hall, 2001.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
Thomas Jefferson National Accelerator Facility Page 1 EC / PCAL ENERGY CALIBRATION Cole Smith UVA PCAL EC Outline Why 2 calorimeters? Requirements Using.
Thomas Jefferson National Accelerator Facility Page 1 12 GeV Software and Computing Review Jefferson Lab February 10-11, 2015 Hall B:User Experience and.
Thomas Jefferson National Accelerator Facility Page 1 Clas12 Reconstruction and Analysis Framework V. Gyurjyan S. Mancilla.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Celluloid An interactive media sequencing language.
Thomas Jefferson National Accelerator Facility Page Hall B:User Software Contributions Gerard Gilfoyle University of Richmond 12 GeV Upgrade Software Review.
Mantid Stakeholder Review Nick Draper 01/11/2007.
V Gyurjyan, D Abbott, J Carbonneau, G Gilfoyle, D Heddle, G Heyes, S Paul, C Timmer, D Weygand V. Gyurjyan JLAB data acquisition and analysis group.
Dual Target Design for CLAS12 Omair Alam and Gerard Gilfoyle Department of Physics, University of Richmond Introduction One of the fundamental goals of.
The JANA Reconstruction Framework David Lawrence - JLab May 25, /25/101JANA - Lawrence - CLAS12 Software Workshop.
Thomas Jefferson National Accelerator Facility Page 1 Hall B: Software Utilization Gerard Gilfoyle University of Richmond 12 GeV Upgrade Software Review.
Thomas Jefferson National Accelerator Facility Page 1 Overview Talk Content Break-out Sessions Planning 12 GeV Upgrade Software Review Jefferson Lab November.
OOD OO Design. OOD-2 OO Development Requirements Use case analysis OO Analysis –Models from the domain and application OO Design –Mapping of model.
Thomas Jefferson National Accelerator Facility Page 1 ClaRA Stress Test V. Gyurjyan S. Mancilla.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Event Management. EMU Graham Heyes April Overview Background Requirements Solution Status.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
The World Leader in High Performance Signal Processing Solutions Linux Industrial I/O Subsystem Open Platform Solutions Michael Hennerich.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Coupling and Cohesion Pfleeger, S., Software Engineering Theory and Practice. Prentice Hall, 2001.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
MAUS Status A. Dobbs CM43 29 th October Contents MAUS Overview Infrastructure Geometry and CDB Detector Updates CKOV EMR KL TOF Tracker Global Tracking.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Software Engineering Salihu Ibrahim Dasuki (PhD) CSC102 INTRODUCTION TO COMPUTER SCIENCE.
Thomas Jefferson National Accelerator Facility Page 1 CLAS12 Workshop Jefferson Lab February 23, 2016 Time-of-Flight Software Status Jerry Gilfoyle, Evgeny.
Chapter 2 Operating System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Chapter 2: System Structures
Sharing Memory: A Kernel Approach AA meeting, March ‘09 High Performance Computing for High Energy Physics Vincenzo Innocente July 20, 2018 V.I. --
Software Design and Architecture
Chapter 9 – Real Memory Organization and Management
Part 3 Design What does design mean in different fields?
CLARA Based Application Vertical Elasticity
Chapter 9: Virtual-Memory Management
Chapter 2: System Structures
Outline Midterm results summary Distributed file systems – continued
An Introduction to Software Architecture
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 2 Operating System Overview
Chapter 13: I/O Systems.
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Thomas Jefferson National Accelerator Facility Page 1 CLAS12 Software D.P. Weygand Thomas Jefferson National Accelerator Facility

Thomas Jefferson National Accelerator Facility Page 2 Projects ClaRA Simulation GEMC CCDB Geometry Service Event Display Tracking SOT Gen III Event Reconstruction Post-Reconstruction Data Access Data-Mining Slow Controls Documentation Doxygen Javadoc Testing/Authentication Detector Subsystems (Reconstruction and Calibration) EC PCAL FTOF CTOF LTCC HTCC OnLine Code Management SVN Bug Reporting Support Visualization Services Support Packages Eg. CLHEP Root

Thomas Jefferson National Accelerator Facility Page 3 Service Oriented Architecture Overview: Services are unassociated loosely coupled units of functionality that have no calls to each other embedded in them. Each service implements one action. Rather than services embedding calls to each other in their source code, they use defined protocols that describe how services pass and parse messages. SOA aims to allow users to string together fairly large chunks of functionality to form ad hoc applications that are built almost entirely from existing software services. The larger the chunks, the fewer the interface points required to implement any given set of functionality; however, very large chunks of functionality may not prove sufficiently granular for easy reuse. Each interface brings with it some amount of processing overhead, so there is a performance consideration in choosing the granularity of services. The great promise of SOA suggests that the marginal cost of creating the nth application is low, as all of the software required already exists to satisfy the requirements of other applications. Ideally, one requires only orchestration to produce a new application.

Thomas Jefferson National Accelerator Facility Page 4 SOA is principally based on object oriented design. Each service is built as a discrete piece of code. This makes it possible to reuse the code in different ways throughout the application by changing only the way an individual service interoperates with other services that make up the application, versus making code changes to the service itself. SOA design principles are used during software development and integration. Software complexity is a term that encompasses numerous properties of a piece of software, all of which affect internal interactions. There is a distinction between the terms complex and complicated. Complicated implies being difficult to understand but with time and effort, ultimately knowable. Complex, on the other hand, describes the interactions between a number of entities. As the number of entities increases, the number of interactions between them would increase exponentially, and it would get to a point where it would be impossible to know and understand all of them. Similarly, higher levels of complexity in software increase the risk of unintentionally interfering with interactions and so increases the chance of introducing defects when making changes. In more extreme cases, it can make modifying the software virtually impossible. SOA/Complexity

Thomas Jefferson National Accelerator Facility Page 5 ClaRa and Cloud Computing Address physics data processing major components as services. Services and information bound to those services can be further abstracted to process layers and composite applications for developing various analyses solutions. Agility, or the ability to change physics data processing process on top of existing services. Ability to monitor points of information and points of service, in real time, to determine the well-being of entire physics data processing application. SOA is the choice for ClaRA as a key architecture: highly concurrent: cloud computing.

Thomas Jefferson National Accelerator Facility Page 6 ClaRA Stress Test V. Gyurjyan S. Mancilla JLAB Scientific Computing Group

Thomas Jefferson National Accelerator Facility Page 7 ClaRA Components DPE Platform (cloud controller) C S DPE C S C S Orchestrator “container” “service” compute node

Thomas Jefferson National Accelerator Facility Page 8 Batch Deployment

Thomas Jefferson National Accelerator Facility Page 9 16 core hyper-threaded (no IO) Event Reconstruction Rate vs Number of Threads kHz 150 ms/event/thread

Thomas Jefferson National Accelerator Facility Page 10 Batch job submission <![CDATA[ setenv CLARA_SERVICES /group/clas12/ClaraServices; $CLARA_SERVICES/bin/clara-dpe -host claradm-ib ]]>

Thomas Jefferson National Accelerator Facility Page 11 R R W W Administrative Services Administrative Services ClaRA Master DPE Persistent Storage Persistent Storage AO Executive Node Farm Node N S1 S2 Sn S1 S2 Sn Single Data-stream Application orchestrator

Thomas Jefferson National Accelerator Facility Page 12 R R W W Administrative Services Administrative Services ClaRA Master DPE AO Executive Node Farm Node N S1 S2 Sn S1 S2 Sn Multiple Data-stream Application Persistent Storage Persistent Storage DS Persistent Storage Persistent Storage

Thomas Jefferson National Accelerator Facility Page 13 Batch queue Common queue Exclusive queue : CentOS core, 12 processing nodes

Thomas Jefferson National Accelerator Facility Page 14 Single Data-stream Application Clas12 Reconstruction: JLAB batch farm

Thomas Jefferson National Accelerator Facility Page 15 Computing Capacity Growth Today: 1K cores in the farm (3 racks, 4-16 cores per node, 2 GB/core) 9K LQCD cores (24 racks, 8-16 cores per node 2-3 GB/core) 180 nodes w/ 720 GPU + Xeon Phi as LQCD compute accelerators 2016: 20K cores in the Farm (10 racks, cores per node, 2 GB/core) Accelerated nodes for Partial Wave Analysis? Even 1 st Pass? Total footprint, power and cooling will grow only slightly. Capacity for detector simulation will be deployed in 2014 and 2015, with additional capacity for analysis in 2015 and Today Experimental Physics has < 5% of the compute capacity of LQCD. In 2016 it will be closer to 50% in dollar terms and number of racks (still small in terms of flops).

Thomas Jefferson National Accelerator Facility Page 16 Compute Paradigm Changes Today, most codes and jobs are serial. Each job uses one core, and we try to run enough jobs to keep all cores busy, without overusing memory or I/O bandwidth. Current weakness: if we have 16 cores per box, and run 24 jobs to keep them all busy, that means that there are 24 input and 24 output file I/O streams running just for this one box! => lots of “head thrashing” in the disk system. Future: most data analysis will be event parallel (“trivially parallel”). Each thread will process one event. Each box will process 1 job (DPE) events in parallel, with 1 input and 1 output => much less head thrashing, higher I/O rates. Possibility: the farm will include GPU or Xeon Phi accelerated nodes! As software becomes ready, we will deploy it!

Thomas Jefferson National Accelerator Facility Page 17 Tested on calibration and simulated data V. Ziegler & M. Mestayer

Thomas Jefferson National Accelerator Facility Page 18 Code to reconstruct the signals from the Forward Time-of-Flight system (FTOF) * Written as a software service so that it can be easily integrated into ClaRA framework. FTOF code converts the TDC and ADC signals into times and energies and corrects for effects like time walk. The position of the hit along the paddle determined by the difference between the TDC signals and the time of the hit -- reconstructed using the average TDC signal and correcting for the propagation time of light along the paddle. Energy deposited extracted from the ADC signal and corrected for light attenuation along the paddle. Modifications to this procedure applied when one of more of the ADC or TDC signals are missing. FTOF code is up and running and will be used in the upcoming ‘stress test’ of the full CLAS12 event reconstruction package. Fig. 1: histogram of the number N adj of adjacent paddles in a cluster normalized to the total number of events. Clusters are formed in a single panel by grouping adjacent hits together. The red, open circles are N adj for panel 1b. The black, filled squares are for panel 1a which is behind panel 1b relative to the target. Most events consist of a single hit, but there is a significant number that have additional paddles in each cluster. * modeled after the ones used in the CLAS6 FTOF reconstruction and tested using Monte Carlo data from the CLAS12, physics-based simulation gemc. Jerry Gilfoyle & Alex Colvill

Thomas Jefferson National Accelerator Facility Page 19 The Intel Xeon Phi KNC processor is essentially a 60-core SMP chip where each core has a dedicated 512-bit wide SSE (Streaming SIMD Extensions) vector unit. All the cores are connected via a 512-bit bidirectional ring interconnect (Figure 1). Currently, the Phi coprocessor is packaged as a separate PCIe device, external to the host processor. Each Phi contains 8 GB of RAM that provides all the memory and file-system storage that every user process, the Linux operating system, and ancillary daemon processes will use. The Phi can mount an external host file-system, which should be used for all file-based activity to conserve device memory for user applications. Intel Xeon Phi MIC Processor

Thomas Jefferson National Accelerator Facility Page 20 Virtual Machine

Thomas Jefferson National Accelerator Facility Page 21 Virtual Machine

Thomas Jefferson National Accelerator Facility Page 22 CLAS12 Constants Database (CCDB) Johann Goetz & Yelena Prok

Thomas Jefferson National Accelerator Facility Page 23 CLAS12 Constants Database (CCDB)

Thomas Jefferson National Accelerator Facility Page 24 Proposed Programming Standards Programming standards create a unified collaboration among members Standards and documentation increase the expected life of software by creating unified design aiding in future maintenance This is a proposed working standard and is open for suggestions and modification. It is found on the Hall-B wiki.

Thomas Jefferson National Accelerator Facility Page 25 Profiling and Static Analysis Profiling is a system of dynamic program analysis. o Individual call stacks o Time analysis o Memory analysis Static Analysis is a pre-run analysis that pinpoints areas of potential error. o Possible bugs o Dead code o Duplicate code o Suboptimal code 150 ms/event/thread

Thomas Jefferson National Accelerator Facility Page 26 Testing Unit Testing o extends life of code o catch errors made by modification o decrease amount of debugging o decrease amount of suboptimal code Testing is a required portion of the proposed coding standards and decreases the amount of time spent working with incorrect code.

Thomas Jefferson National Accelerator Facility Page 27 PYTHON/ClaRa Python has a simple syntax that allows for very quick prototyping of experimental analysis services, along with a large number of incredibly useful built in functions and data-types. There are also a huge number of open source, highly optimized, and well documented computational analysis modules available to be imported into any analysis service. A small handful of the supported areas are: Full Statistical and Function Optimization Toolkits Eigenvalue and Eigenvectors of large Sparse Matrices Integration Functions with Support for Integration of Ordinary Differential Equations Fourier TransformsInterpolationSignal Processing Linear AlgebraSpecial FunctionsFull Statistics Suite Highly efficient File I/OSpatial Algorithm and Data Structures Clustering Algorithms for theory, target detection, and other areas

Thomas Jefferson National Accelerator Facility Page 28 The existing Cmsg Protocol will be wrapped into an importable Python module that provides the needed methods to receive and send a Cmsg Container to/from the Python Service and CLARA. If the received Cmsg Container contains EVIO data, a separate EVIO Support module can be imported in order to provide the needed functions to read and append data to the EVIO Event stored in the Cmsg container. Python Analysis Service EVIO Support Module Written in Python Existing CMsg written in C Python Wrapper Imported PYTHON/ClaRa

Thomas Jefferson National Accelerator Facility Page 29 Scaled SOA implemented successfully via ClaRa Software infrastructure is being integrated into the Jlab batch farm system More services need to written More Orchestrators/Applications Progress on Major Systems CCDB Geometry Service Reconstruction Tracking TOF & EC Testing/Standards/QA Actively being developed Summary

Thomas Jefferson National Accelerator Facility Page 30 User Interfaces/Ease of Use GUI to CCDB Data Handling Data Access/Mining EVIO data access via dictionary Virtual Box Programming Libraries: ROOT,ScaVis, SciPy/NumPy … Examples Examples Examples Summary cont.

Thomas Jefferson National Accelerator Facility Page 31