Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACAT 2002 2002 Lassi A. Tuura, Northeastern University CMS Data Analysis Current Status and Future Strategy On behalf of CMS.

Similar presentations


Presentation on theme: "ACAT 2002 2002 Lassi A. Tuura, Northeastern University CMS Data Analysis Current Status and Future Strategy On behalf of CMS."— Presentation transcript:

1 ACAT 2002 http://iguana.cern.chJune, 2002 Lassi A. Tuura, Northeastern University CMS Data Analysis Current Status and Future Strategy On behalf of CMS Collaboration Lassi A. Tuura Northeastern University, Boston

2 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 2Overview v The Context — CMS Analysis Today v Data Analysis Environment Architecture r Overview r COBRA r IGUANA r GRID/Production v Tomorrow and Beyond r Leveraging current frameworks in the Grid-enriched analysis environment r Clarens client-server prototype r Other prototype activities

3 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 3 Challenges:Complexity Geographic Dispersion Direct Access To Data Migration from Reconstruction to Trigger Environments: Real-Time Event Filter, Online Monitoring Pre-emptive Simulation, Reconstruction, Analysis Interactive Statistical Analysis Context

4 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 4 Current CMS Production Pythia Zebra files with HITS HEPEVT Ntuples CMSIM (GEANT3) ORCA/COBRA Digitization (merge signal and pile-up) Objectivity Database ORCA/COBRA ooHit Formatter Objectivity Database OSCAR/COBRA (GEANT4) ORCA User Analysis Ntuples or Root files Objectivity Database IGUANA Interactive Analysis

5 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 5 Complexity of Production 2002 7TB toward T1 4TB toward T2 File Transfer by GDMP and by perl Scripts over scp/bbcp 17TBData Size (Not including fz files from Simulation) ~11,000Number of Files 6-8 Number of Production Passes for each Dataset (including analysis group processing done by production) 176 CPUsLargest Local Center ~1000Number of CPU’s 21Number of Computing Centers 11Number of Regional Centers

6 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 6 Interactive Analysis Lizard Qt plotter ANAPHE histogram extended with pointers to CMS events Emacs used to edit a CMS C++ plugin to create and fill histograms OpenInventor-based display of selected event Python shell with Lizard & CMS modules Most of analysis is done using NTUPLEs in PAW, some in ROOT

7 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 7 Behind the Scenes: Frameworks Federationwizards Detector/EventDisplay Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS Objytools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Consistent User Interface Coherent basic tools and mechanisms

8 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 8 ODBMS GEANT 3 / 4 CLHEP PAW Replacement C++ Standard Library + Extension Toolkits C++ Standard Library + Extension Toolkits Frameworks Disected Calibration Objects Calibration Objects Generic Application Framework Physics modules Grid-Uploadable BasicServices Adapters and Extensions Configuration Objects Configuration Objects Event Objects Event Objects (Grid-aware) Data-Products SpecificFrameworks Event Filter Reconstruction Algorithms Physics Analysis Data Monitoring

9 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 9 v Several frameworks provide the environment together r Open: No central framework with all functionality – Frameworks are designed to be extensible – … and to collaborate with other software r Coherent: User sees “final” smooth interface – Achieved by integrating the frameworks together – … but the user does not do this work him/herself ! r Design applied at both framework and object design level v Successfully applied in many parts of CMS software r Applications, persistency; sub-frameworks; visualisation; … r No loss of usability, functionality or performance r Has made it easy to integrate directly with many existing tools v This is nothing novel — it is part of the standard risk- mitigation strategy of any modern industrial solution Framework Design Basis

10 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 10 Frameworks: COBRA Federationwizards Detector/EventDisplay Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS Objytools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Consistent User Interface Coherent basic tools and mechanisms

11 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 11 COBRA: Main Components v Push- and pull-mode execution—and any mixture r Reconstruction-on-demand is a key concept in COBRA r Detector-centric reconstruction—push data from event r Reconstruction-unit-centric reconstruction—pull/create data as needed v Event data and related structures r Basic support for commonly needed objects (hits, digis, containers, …) v Application environments r Basic application frameworks, various semi-specialised applications r Lots of error-handling and recovery code (automatic recovery after crash, …) v Meta data: a key component r Data chunking, system and user collections, data streams, file management, job concepts, configuration and setup records, redirected navigation after reprocessing, …

12 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 12 COBRA: Main Strengths v Algorithms in plug-ins r “Publish-yourself-plug-ins”—self-describing data producers v Strong meta-data facilities r Reconstruction-on-demand matches data product concept very well – Grid virtual data products concept really just an extension r Convenient mapping of data products to chunks: files, containers, … r Scatter / gather: decompose jobs, gather data – One logical job can be chopped into many physical processes, we still know it is logically the same job no matter which process it is running in v Adapts automatically to many environments without special configuration: interactive, batch, farm, stand-alone, trigger, … r Through appropriate use of enabling techniques (transactions, locking, refs) r No data post-processing required r Well-matched to production tools (IMPALA)

13 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 13 Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity

14 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 14 Refs & Navigation Refs & Navigation Queries Cache Management Cache Management Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity

15 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 15 Object Naming Object Naming Configurations (Data Sets) Configurations (Data Sets) Collections Run Resume & Crash Recovery Run Resume & Crash Recovery Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity

16 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 16 File Size Control File Size Control Farm Management Farm Management System Management System Management Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity

17 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 17 Frameworks: IGUANA Federationwizards Detector/EventDisplay Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS Objytools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Consistent User Interface Coherent basic tools and mechanisms

18 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 18 User Interface and Visualisation v IGUANA: a generic toolkit for user interfaces and visualisation r Builds on existing high-quality libraries (Qt, OpenInventor, Anaphe, …) r Used to implement specific visualisation applications in other projects v Main technical focus: provide a platform that makes it easy to integrate GUIs as a coherent whole, to provide application services and to visualise any application object r Many categories / layers: GUI gadgets & support, application environment, data visualisers, data representation methods, control panels, … r Designed to integrate with and into other applications r Virtually everything is in plug-ins (can still be statically linked) Plug-In Cache Object Factory Object Factory Component Database Plug-In Cache Plug-In Object Factory Attached Unattached

19 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 19 Illustration: 3D Visualisation QMainWindow Browser Site QMDIShell Browser Site QMDIShell Browser Site 3D Browser Twig Browser

20 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 20 IGUANA GUI Integration Integration Action Visualise Results, Modify Objects, Further Interaction

21 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 21 Tomorrow and Beyond v Leverage the current frameworks on the grid r Many native COBRA concepts match well with grid – (Virtual) data products ~ reconstruction-on-demand – Recording and matching configuration and setup information – Production interfaces: catalogs, redirection, MSS hooks – Scatter/gather job decomposition, production environment r COBRA-based applications can be encapsulated for distributed analysis r IGUANA already separates application objects, model and viewer – Many possibilities for introducing distributed links r IGUANA+COBRA provides a platform for a coherent, well-integrated interface no matter where the code runs and data comes and goes – Both have loads of knobs and hooks for integration v Aiming at adapting the existing software where possible r Adapt and work within CMS software (COBRA, ORCA, …) and existing analysis tools (ROOT, Lizard, …)—don’t replace them

22 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 22 Client RPC Web Server Clarens Service http/https Prototypes: Clarens Web Portals v Grid-enabling the working environment for physicists' data analysis v Communication with clients via the commodity XML-RPC protocol  Implementation independence v Server implemented in C++: access to the CMS OO analysis toolkit v Server provides a remote API to Grid tools r The Virtual Data Toolkit: Object collection access r Data movement between tier centres using GSI-FTP r CMS analysis software (ORCA/COBRA) r Security services provided by the Grid (GSI) r No Globus needed on client side, only certificate

23 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 23 Tool plugin module Production system and data repositories ORCA analysis farm(s) (or distributed `farm’ using grid queues) RDBMS based data warehouse(s) PIAF/Proof/.. type analysis farm(s) Local disk User TAGs/AODs data flow Physics Query flow Tier 1/2 Tier 0/1/2 Tier 3/4/5 Production data flow TAG and AOD extraction/conversion/transport services Data extraction Web service(s) Local analysis tool: Lizard/ROOT/… Web browser Query Web service(s) Prototypes: Clarens Web Portals…

24 June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 24 Other Prototypes v Tag database optimisation r Fast sample selection is crucial r Various models already tried r Experimenting with RDBMS v MOP: distributed job submission system r Allows submission of CMS production jobs from a central location, run on remote locations, and return results – Job Specification: IMPALA – Replication: GDMP – Globus GRAM – Job Scheduling: Condor-G and local systems


Download ppt "ACAT 2002 2002 Lassi A. Tuura, Northeastern University CMS Data Analysis Current Status and Future Strategy On behalf of CMS."

Similar presentations


Ads by Google