Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.

Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA

C. Jones CHEP032 Overview Describe “Standard” processing model Describe “On Demand” processing model –Similar to GriPhN’s “Virtual Data Model” What we’ve learned User reaction Conclusion

C. Jones CHEP033 Standard Processing System Designed for reconstruction –All objects are supposed to be created for each event Each processing step is broken into its own module –E.g., track finding and track fitting are separate The modules are run in a user-specified sequence Each module adds its data to the ‘event’ when the module is executed Each module can halt the processing of an event Input Module Track FinderTrack Fitter Output Module

C. Jones CHEP034 Critique of Standard Design Good –Simple mental model Users can feel confident they know how the program works –Easy to debug Simple to determine which module had a problem Bad –User must know inter-module dependencies in order to place the modules in the correct sequence Users often run jobs with many modules they do not need in order to avoid missing a module they might need –Optimization of module sequence must be done by hand –Reading back from storage is inefficient Must create all objects from storage even if job does not use them

C. Jones CHEP035 On-demand System Designed for analysis batch processing –Not all objects need to be created each event Processing is broken into different types of modules –Providers Source: reads data from a persistent store Producer: creates data on demand –Requestors Sink: writes data to a persistent store Processor: analyzes and filters ‘events’ Data providers register what data they can provide Processing sequence is set by the order of data requests Only Processors can halt the processing of an ‘event’ Source Processor AProcessor B Sink

C. Jones CHEP036 Data Model A Record holds all data that are related by life-time e.g., Event Record holds Raw Data, Tracks, Calorimeter Showers, etc. A Stream is a time-ordered sequence of Records A Frame is a collection of Records that describe the state of the detector at an instant in time. All data are accessed via the exact same interface and mechanism

C. Jones CHEP037 Data Flow: Frame as Data Bus Event Database Calibration Database TrackFinderTrackFitter Frame SelectBtoKPiEventDisplayEvent List Sources: data from storage Producers: data from algorithm Processors: analyze and filter data Sinks: store data Data Providers: data returned when requested Data Requestor: sequentially run requestors for each new Record from a source

C. Jones CHEP038 Callback Mechanism Provider registers a Proxy for each data type it can create Proxies are placed in the Record and indexed with a key –Type: the object type returned by the Proxy –Usage: an optional string describing use of object –Production: an optional run-time settable string Users access data via a type-safe templated function call List pions; extract( iFrame.record(kEvent), pions); (based on ideas from Babar’s Ifd package) extract call builds the key and asks Record for Proxy Proxy runs algorithm to deliver data –Proxy caches data in case of another request –If a problem occurs, an exception is thrown

C. Jones CHEP039 Callback Example: Algorithm Processor SelectBtoKPi Producer Track Fitter FitPionsProxy FitKaonsProxy … Track Finder TracksProxy HitCalibrator CalibratedHitsProxy Source Calibration DB PedestalProxy AlignmentProxy … Raw Data File RawDataProxy

C. Jones CHEP0310 Callback Example: Storage Processor SelectBtoKPi Source Event Database FitPionsProxy FitKaonsProxy RawDataProxy … In both examples, same SelectBtoKPi shared object can be used

C. Jones CHEP0311 Critique of On-demand System Good –Can be used for all data access needs Online software trigger, Online data quality monitoring, Online event display, calibration, reconstruction, MC generation, Offline event display, analysis –Self organizes calling chain Users can add Producers in any order –Optimizes access from Storage Sources only need to say when a new Record (e.g., event) is available Data for a Record is retrieved/decoded on demand Bad –Can be harder to debug since no explicit call order Use of exceptions key to simplifying debugging –Performance testing is more challenging

C. Jones CHEP0312 What We Have Learned First release of the system was September 1998 Callback mechanism can be made fast –Proxy lookup takes less than 1 part in 10 -7 of CPU time on simple job that processed 2,000 events/s on moderate computer Cyclical dependencies are easy to find and fix –Only happened once and was found immediately on first test Do not need to modify data once it is created –Preliminary versions of data are given their own key Automatically optimizes performance of reconstruction –Trivially added filter to remove junk events by using FoundTracks Optimize analysis by storing many small objects –Only need to retrieve and decode data needed for current job

C. Jones CHEP0313 User Reactions In general, user response has been very positive –Previously CLEO used a ‘standard system’ written in FORTRAN Reconstruction coders like the system –We have code skeleton generators for Proxy/Producer/Processor Only need to add their specific code –Easy for them to test their code Analysis coders can still program the ‘old way’ –All analysis code in the ‘event’ routine Some analysis coders are pushing bounds –Place selectors (e.g. cuts for tracks) in Producers Users share selectors via dynamically loaded Producers –Processor only used to fill Histograms/Ntuples –If stored selections, only rerun Processor when reprocessing data

C. Jones CHEP0314 Conclusion It is possible to build an ‘on demand’ system that is –efficient –debuggable –capable of dealing with all data (not just data in an event) –easy to write components –good for reconstruction –acceptable to users Some reasons for success –Skeleton code generators User only has to write new code, not infrastructure ‘glue’ –Users do not need to register what data they may request Data reads occur more frequently than writes –Simple rule for when algorithms run If you add a Producer, it takes precedence over a Source

Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.

Similar presentations

Presentation on theme: "Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.

Similar presentations

Presentation on theme: "Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA."— Presentation transcript:

Similar presentations

About project

Feedback