Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.

Slides:



Advertisements
Similar presentations
Dynamic Demand Inventory Control System By Supamas Viriyanusorn Jitrayut Junnapart.
Advertisements

Chapter 9. Performance Management Enterprise wide endeavor Research and ascertain all performance problems – not just DBMS Five factors influence DB performance.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir.
Chapter 3 Loaders and Linkers
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
St Testing, Simulation and Monitoring (actually mostly simulation) Stephen Hillier Joint Meeting, Mainz, June 2001.
EventStore Managing Event Versioning and Data Partitioning using Legacy Data Formats Chris Jones Valentin Kuznetsov Dan Riley Greg Sharp CLEO Collaboration.
Usage of the Python Programming Language in the CMS Experiment Rick Wilkinson (Caltech), Benedikt Hegner (CERN) On behalf of CMS Offline & Computing 1.
Chapter 1 and 2 Computer System and Operating System Overview
Computer Organization and Architecture
The Event as an Object-Relational Database: Avoiding the Dependency Nightmare Christopher D. Jones Cornell University, USA.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CLEO’s User Centric Data Access System Christopher D. Jones Cornell University.
Chapter 3 Memory Management: Virtual Memory
2/10/2000 CHEP2000 Padova Italy The BaBar Online Databases George Zioulas SLAC For the BaBar Computing Group.
Bob Jacobsen August 6 Informal discussion of BaBar software BaBar offline code’s ecological niche Set of non-overlapping idioms Event - our software bus?
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
1 History of compiler development 1953 IBM develops the 701 EDPM (Electronic Data Processing Machine), the first general purpose computer, built as a “defense.
Magnetic Field Measurement System as Part of a Software Family Jerzy M. Nogiec Joe DiMarco Fermilab.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
IceCube DAQ Mtg. 10,28-30 IceCube DAQ: “DOM MB to Event Builder”
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
Event Data History David Adams BNL Atlas Software Week December 2001.
Andrew S. Budarevsky Adaptive Application Data Management Overview.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Recent Software Issues L3 Review of SM Software, 28 Oct Recent Software Issues Occasional runs had large numbers of single-event files. INIT message.
ALICE, ATLAS, CMS & LHCb joint workshop on
3/24/2003CHEP'03, La Jolla, USA Object Database for Constants: The common CLEO Online and Offline solution Hubert Schwarthoff Cornell University With N.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
A Technical Validation Module for the offline Auger-Lecce, 17 September 2009  Design  The SValidStore Module  Example  Scripting  Status.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett Interval of Validity Service IOVSvc ATLAS Software Week May Architecture.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.
Navigation Timing Studies of the ATLAS High-Level Trigger Andrew Lowe Royal Holloway, University of London.
Operating System Principles And Multitasking
STAR Event data storage and management in STAR V. Perevoztchikov Brookhaven National Laboratory,USA.
Artemis School On Calibration and Performance of ATLAS Detectors Jörg Stelzer / David Berge.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
CHEP /21/03 Detector Description Framework in LHCb Sébastien Ponce CERN.
2. WRITING SIMPLE PROGRAMS Rocky K. C. Chang September 10, 2015 (Adapted from John Zelle’s slides)
Claudio Grandi INFN-Bologna CHEP 2000Abstract B 029 Object Oriented simulation of the Level 1 Trigger system of a CMS muon chamber Claudio Grandi INFN-Bologna.
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
General requirements for BES III offline & EF selection software Weidong Li.
Slides prepared by Rose Williams, Binghamton University Chapter 16 Collections and Iterators.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.
Object Oriented reconstruction of the CMS muon chambers CHEP February, Padova Annalina Vitelli - INFN Torino.
Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.
A Validation System for the Complex Event Processing Directives of the ATLAS Shifter Assistant Tool G. Anders (CERN), G. Avolio (CERN), A. Kazarov (PNPI),
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Visual Programming Borland Delphi. Developing Applications Borland Delphi is an object-oriented, visual programming environment to develop 32-bit applications.
Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Chapter 0: Historical Overview.
KID - KLOE Integrated Dataflow
Computer Organization
CMS High Level Trigger Configuration Management
SOFTWARE DESIGN AND ARCHITECTURE
X in [Integration, Delivery, Deployment]
Compiler Construction
Use Of GAUDI framework in Online Environment
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA

C. Jones CHEP032 Overview Describe “Standard” processing model Describe “On Demand” processing model –Similar to GriPhN’s “Virtual Data Model” What we’ve learned User reaction Conclusion

C. Jones CHEP033 Standard Processing System Designed for reconstruction –All objects are supposed to be created for each event Each processing step is broken into its own module –E.g., track finding and track fitting are separate The modules are run in a user-specified sequence Each module adds its data to the ‘event’ when the module is executed Each module can halt the processing of an event Input Module Track FinderTrack Fitter Output Module

C. Jones CHEP034 Critique of Standard Design Good –Simple mental model Users can feel confident they know how the program works –Easy to debug Simple to determine which module had a problem Bad –User must know inter-module dependencies in order to place the modules in the correct sequence Users often run jobs with many modules they do not need in order to avoid missing a module they might need –Optimization of module sequence must be done by hand –Reading back from storage is inefficient Must create all objects from storage even if job does not use them

C. Jones CHEP035 On-demand System Designed for analysis batch processing –Not all objects need to be created each event Processing is broken into different types of modules –Providers Source: reads data from a persistent store Producer: creates data on demand –Requestors Sink: writes data to a persistent store Processor: analyzes and filters ‘events’ Data providers register what data they can provide Processing sequence is set by the order of data requests Only Processors can halt the processing of an ‘event’ Source Processor AProcessor B Sink

C. Jones CHEP036 Data Model A Record holds all data that are related by life-time e.g., Event Record holds Raw Data, Tracks, Calorimeter Showers, etc. A Stream is a time-ordered sequence of Records A Frame is a collection of Records that describe the state of the detector at an instant in time. All data are accessed via the exact same interface and mechanism

C. Jones CHEP037 Data Flow: Frame as Data Bus Event Database Calibration Database TrackFinderTrackFitter Frame SelectBtoKPiEventDisplayEvent List Sources: data from storage Producers: data from algorithm Processors: analyze and filter data Sinks: store data Data Providers: data returned when requested Data Requestor: sequentially run requestors for each new Record from a source

C. Jones CHEP038 Callback Mechanism Provider registers a Proxy for each data type it can create Proxies are placed in the Record and indexed with a key –Type: the object type returned by the Proxy –Usage: an optional string describing use of object –Production: an optional run-time settable string Users access data via a type-safe templated function call List pions; extract( iFrame.record(kEvent), pions); (based on ideas from Babar’s Ifd package) extract call builds the key and asks Record for Proxy Proxy runs algorithm to deliver data –Proxy caches data in case of another request –If a problem occurs, an exception is thrown

C. Jones CHEP039 Callback Example: Algorithm Processor SelectBtoKPi Producer Track Fitter FitPionsProxy FitKaonsProxy … Track Finder TracksProxy HitCalibrator CalibratedHitsProxy Source Calibration DB PedestalProxy AlignmentProxy … Raw Data File RawDataProxy

C. Jones CHEP0310 Callback Example: Storage Processor SelectBtoKPi Source Event Database FitPionsProxy FitKaonsProxy RawDataProxy … In both examples, same SelectBtoKPi shared object can be used

C. Jones CHEP0311 Critique of On-demand System Good –Can be used for all data access needs Online software trigger, Online data quality monitoring, Online event display, calibration, reconstruction, MC generation, Offline event display, analysis –Self organizes calling chain Users can add Producers in any order –Optimizes access from Storage Sources only need to say when a new Record (e.g., event) is available Data for a Record is retrieved/decoded on demand Bad –Can be harder to debug since no explicit call order Use of exceptions key to simplifying debugging –Performance testing is more challenging

C. Jones CHEP0312 What We Have Learned First release of the system was September 1998 Callback mechanism can be made fast –Proxy lookup takes less than 1 part in of CPU time on simple job that processed 2,000 events/s on moderate computer Cyclical dependencies are easy to find and fix –Only happened once and was found immediately on first test Do not need to modify data once it is created –Preliminary versions of data are given their own key Automatically optimizes performance of reconstruction –Trivially added filter to remove junk events by using FoundTracks Optimize analysis by storing many small objects –Only need to retrieve and decode data needed for current job

C. Jones CHEP0313 User Reactions In general, user response has been very positive –Previously CLEO used a ‘standard system’ written in FORTRAN Reconstruction coders like the system –We have code skeleton generators for Proxy/Producer/Processor Only need to add their specific code –Easy for them to test their code Analysis coders can still program the ‘old way’ –All analysis code in the ‘event’ routine Some analysis coders are pushing bounds –Place selectors (e.g. cuts for tracks) in Producers Users share selectors via dynamically loaded Producers –Processor only used to fill Histograms/Ntuples –If stored selections, only rerun Processor when reprocessing data

C. Jones CHEP0314 Conclusion It is possible to build an ‘on demand’ system that is –efficient –debuggable –capable of dealing with all data (not just data in an event) –easy to write components –good for reconstruction –acceptable to users Some reasons for success –Skeleton code generators User only has to write new code, not infrastructure ‘glue’ –Users do not need to register what data they may request Data reads occur more frequently than writes –Simple rule for when algorithms run If you add a Producer, it takes precedence over a Source