ACAT 2002 2002 Lassi A. Tuura, Northeastern University CMS Data Analysis Current Status and Future Strategy On behalf of CMS.

Slides:



Advertisements
Similar presentations
WP2: Data Management Gavin McCance University of Glasgow.
Advertisements

Physicist Interfaces Project an overview Physicist Interfaces Project an overview Jakub T. Moscicki CERN June 2003.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Database System Concepts and Architecture
SEAL Developers 3, 2003 Lassi A. Tuura, Northeastern University CMS Components Plug-in Manager Lassi A. Tuura Northeastern University,
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Vincenzo Innocente, BluePrint RTAGNuts & Bolts1 Architecture Nuts & Bolts Vincenzo Innocente CMS.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
The Cactus Portal A Case Study in Grid Portal Development Michael Paul Russell Dept of Computer Science The University of Chicago
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
CHEP `03 March 24, 2003 Vincenzo Innocente CERN/EP CMS Data Analysis: Present Status, Future Strategies Vincenzo.
Off-line Graphics Tools Ianna Osborne Northeastern University.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term.
Ianna Gaponenko, Northeastern University, Boston The CMS IGUANA Project1 George Alverson, Ianna Gaponenko, and Lucas Taylor Northeastern University, Boston.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Introduzione al Software di CMS N. Amapane. Nicola AmapaneTorino, Aprile Outline CMS Software projects The framework: overview Finding more.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
CoG Kit Overview Gregor von Laszewski Keith Jackson.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Outline What is IGUANA IGUANA and Other Projects Architecture Framework ORCA Visualisation IGUANA at D0 GEANT4 Visualisation OSCAR Visualisation DDD Visualisation.
Computing in CMS May 24, 2002 NorduGrid Helsinki workshop Veikko Karimäki/HIP.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
SEAL Core Libraries and Services CLHEP Workshop 28 January 2003 P. Mato / CERN Shared Environment for Applications at LHC.
3 rd May’03Nick Brook – 4 th LHC Symposium1 Data Analysis – Present & Future Nick Brook University of Bristol Generic Requirements & Introduction Expt.
0 Fermilab SW&C Internal Review Oct 24, 2000 David Stickland, Princeton University CMS Software and Computing Status The Functional Prototypes.
UK Grid Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid Prototype and Globus Technical Meeting QMW, 22nd November 2000 Glenn Patrick (RAL)
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
Interactive Data Analysis on the “Grid” Tech-X/SLAC/PPDG:CS-11 Balamurali Ananthan David Alexander
Claudio Grandi INFN-Bologna CHEP 2000Abstract B 029 Object Oriented simulation of the Level 1 Trigger system of a CMS muon chamber Claudio Grandi INFN-Bologna.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Chapter 2 Database Environment.
Giulio Eulisse, Northeastern University CHEP’04, Interlaken, 27th Sep - 1st Oct, 2004 CHEP’04 IGUANA Interactive Graphics Project:
Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.
Ianna Gaponenko, Northeastern University, Boston The CMS IGUANA Project1 George Alverson, Ianna Gaponenko and Lucas Taylor Northeastern University, Boston.
Geant4 User Workshop 15, 2002 Lassi A. Tuura, Northeastern University IGUANA Overview Lassi A. Tuura Northeastern University,
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
CPT Week, November , 2002 Lassi A. Tuura, Northeastern University Core Framework Infrastructure Lassi A. Tuura Northeastern.
LCG/Blueprint RTAG 5-6, 2002 Lassi A. Tuura, Northeastern University CMS Components Generic Infrastructure Bits Lassi A. Tuura.
Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork1 Software Frameworks for HEP Data Analysis Vincenzo Innocente CERN/EP.
VI/ CERN Dec 4 CMS Software Architecture vs Hybrid Store Vincenzo Innocente CMS Week CERN, Dec
HEPVis May, 2001 Lassi A. Tuura, Northeastern University Coherent and Non-Invasive Open Analysis Architecture Lassi A. Tuura.
(on behalf of the POOL team)
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

ACAT Lassi A. Tuura, Northeastern University CMS Data Analysis Current Status and Future Strategy On behalf of CMS Collaboration Lassi A. Tuura Northeastern University, Boston

June, 2002 Lassi A. Tuura, Northeastern University 2Overview v The Context — CMS Analysis Today v Data Analysis Environment Architecture r Overview r COBRA r IGUANA r GRID/Production v Tomorrow and Beyond r Leveraging current frameworks in the Grid-enriched analysis environment r Clarens client-server prototype r Other prototype activities

June, 2002 Lassi A. Tuura, Northeastern University 3 Challenges:Complexity Geographic Dispersion Direct Access To Data Migration from Reconstruction to Trigger Environments: Real-Time Event Filter, Online Monitoring Pre-emptive Simulation, Reconstruction, Analysis Interactive Statistical Analysis Context

June, 2002 Lassi A. Tuura, Northeastern University 4 Current CMS Production Pythia Zebra files with HITS HEPEVT Ntuples CMSIM (GEANT3) ORCA/COBRA Digitization (merge signal and pile-up) Objectivity Database ORCA/COBRA ooHit Formatter Objectivity Database OSCAR/COBRA (GEANT4) ORCA User Analysis Ntuples or Root files Objectivity Database IGUANA Interactive Analysis

June, 2002 Lassi A. Tuura, Northeastern University 5 Complexity of Production TB toward T1 4TB toward T2 File Transfer by GDMP and by perl Scripts over scp/bbcp 17TBData Size (Not including fz files from Simulation) ~11,000Number of Files 6-8 Number of Production Passes for each Dataset (including analysis group processing done by production) 176 CPUsLargest Local Center ~1000Number of CPU’s 21Number of Computing Centers 11Number of Regional Centers

June, 2002 Lassi A. Tuura, Northeastern University 6 Interactive Analysis Lizard Qt plotter ANAPHE histogram extended with pointers to CMS events Emacs used to edit a CMS C++ plugin to create and fill histograms OpenInventor-based display of selected event Python shell with Lizard & CMS modules Most of analysis is done using NTUPLEs in PAW, some in ROOT

June, 2002 Lassi A. Tuura, Northeastern University 7 Behind the Scenes: Frameworks Federationwizards Detector/EventDisplay Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS Objytools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Consistent User Interface Coherent basic tools and mechanisms

June, 2002 Lassi A. Tuura, Northeastern University 8 ODBMS GEANT 3 / 4 CLHEP PAW Replacement C++ Standard Library + Extension Toolkits C++ Standard Library + Extension Toolkits Frameworks Disected Calibration Objects Calibration Objects Generic Application Framework Physics modules Grid-Uploadable BasicServices Adapters and Extensions Configuration Objects Configuration Objects Event Objects Event Objects (Grid-aware) Data-Products SpecificFrameworks Event Filter Reconstruction Algorithms Physics Analysis Data Monitoring

June, 2002 Lassi A. Tuura, Northeastern University 9 v Several frameworks provide the environment together r Open: No central framework with all functionality – Frameworks are designed to be extensible – … and to collaborate with other software r Coherent: User sees “final” smooth interface – Achieved by integrating the frameworks together – … but the user does not do this work him/herself ! r Design applied at both framework and object design level v Successfully applied in many parts of CMS software r Applications, persistency; sub-frameworks; visualisation; … r No loss of usability, functionality or performance r Has made it easy to integrate directly with many existing tools v This is nothing novel — it is part of the standard risk- mitigation strategy of any modern industrial solution Framework Design Basis

June, 2002 Lassi A. Tuura, Northeastern University 10 Frameworks: COBRA Federationwizards Detector/EventDisplay Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS Objytools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Consistent User Interface Coherent basic tools and mechanisms

June, 2002 Lassi A. Tuura, Northeastern University 11 COBRA: Main Components v Push- and pull-mode execution—and any mixture r Reconstruction-on-demand is a key concept in COBRA r Detector-centric reconstruction—push data from event r Reconstruction-unit-centric reconstruction—pull/create data as needed v Event data and related structures r Basic support for commonly needed objects (hits, digis, containers, …) v Application environments r Basic application frameworks, various semi-specialised applications r Lots of error-handling and recovery code (automatic recovery after crash, …) v Meta data: a key component r Data chunking, system and user collections, data streams, file management, job concepts, configuration and setup records, redirected navigation after reprocessing, …

June, 2002 Lassi A. Tuura, Northeastern University 12 COBRA: Main Strengths v Algorithms in plug-ins r “Publish-yourself-plug-ins”—self-describing data producers v Strong meta-data facilities r Reconstruction-on-demand matches data product concept very well – Grid virtual data products concept really just an extension r Convenient mapping of data products to chunks: files, containers, … r Scatter / gather: decompose jobs, gather data – One logical job can be chopped into many physical processes, we still know it is logically the same job no matter which process it is running in v Adapts automatically to many environments without special configuration: interactive, batch, farm, stand-alone, trigger, … r Through appropriate use of enabling techniques (transactions, locking, refs) r No data post-processing required r Well-matched to production tools (IMPALA)

June, 2002 Lassi A. Tuura, Northeastern University 13 Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity

June, 2002 Lassi A. Tuura, Northeastern University 14 Refs & Navigation Refs & Navigation Queries Cache Management Cache Management Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity

June, 2002 Lassi A. Tuura, Northeastern University 15 Object Naming Object Naming Configurations (Data Sets) Configurations (Data Sets) Collections Run Resume & Crash Recovery Run Resume & Crash Recovery Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity

June, 2002 Lassi A. Tuura, Northeastern University 16 File Size Control File Size Control Farm Management Farm Management System Management System Management Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity

June, 2002 Lassi A. Tuura, Northeastern University 17 Frameworks: IGUANA Federationwizards Detector/EventDisplay Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS Objytools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Consistent User Interface Coherent basic tools and mechanisms

June, 2002 Lassi A. Tuura, Northeastern University 18 User Interface and Visualisation v IGUANA: a generic toolkit for user interfaces and visualisation r Builds on existing high-quality libraries (Qt, OpenInventor, Anaphe, …) r Used to implement specific visualisation applications in other projects v Main technical focus: provide a platform that makes it easy to integrate GUIs as a coherent whole, to provide application services and to visualise any application object r Many categories / layers: GUI gadgets & support, application environment, data visualisers, data representation methods, control panels, … r Designed to integrate with and into other applications r Virtually everything is in plug-ins (can still be statically linked) Plug-In Cache Object Factory Object Factory Component Database Plug-In Cache Plug-In Object Factory Attached Unattached

June, 2002 Lassi A. Tuura, Northeastern University 19 Illustration: 3D Visualisation QMainWindow Browser Site QMDIShell Browser Site QMDIShell Browser Site 3D Browser Twig Browser

June, 2002 Lassi A. Tuura, Northeastern University 20 IGUANA GUI Integration Integration Action Visualise Results, Modify Objects, Further Interaction

June, 2002 Lassi A. Tuura, Northeastern University 21 Tomorrow and Beyond v Leverage the current frameworks on the grid r Many native COBRA concepts match well with grid – (Virtual) data products ~ reconstruction-on-demand – Recording and matching configuration and setup information – Production interfaces: catalogs, redirection, MSS hooks – Scatter/gather job decomposition, production environment r COBRA-based applications can be encapsulated for distributed analysis r IGUANA already separates application objects, model and viewer – Many possibilities for introducing distributed links r IGUANA+COBRA provides a platform for a coherent, well-integrated interface no matter where the code runs and data comes and goes – Both have loads of knobs and hooks for integration v Aiming at adapting the existing software where possible r Adapt and work within CMS software (COBRA, ORCA, …) and existing analysis tools (ROOT, Lizard, …)—don’t replace them

June, 2002 Lassi A. Tuura, Northeastern University 22 Client RPC Web Server Clarens Service http/https Prototypes: Clarens Web Portals v Grid-enabling the working environment for physicists' data analysis v Communication with clients via the commodity XML-RPC protocol  Implementation independence v Server implemented in C++: access to the CMS OO analysis toolkit v Server provides a remote API to Grid tools r The Virtual Data Toolkit: Object collection access r Data movement between tier centres using GSI-FTP r CMS analysis software (ORCA/COBRA) r Security services provided by the Grid (GSI) r No Globus needed on client side, only certificate

June, 2002 Lassi A. Tuura, Northeastern University 23 Tool plugin module Production system and data repositories ORCA analysis farm(s) (or distributed `farm’ using grid queues) RDBMS based data warehouse(s) PIAF/Proof/.. type analysis farm(s) Local disk User TAGs/AODs data flow Physics Query flow Tier 1/2 Tier 0/1/2 Tier 3/4/5 Production data flow TAG and AOD extraction/conversion/transport services Data extraction Web service(s) Local analysis tool: Lizard/ROOT/… Web browser Query Web service(s) Prototypes: Clarens Web Portals…

June, 2002 Lassi A. Tuura, Northeastern University 24 Other Prototypes v Tag database optimisation r Fast sample selection is crucial r Various models already tried r Experimenting with RDBMS v MOP: distributed job submission system r Allows submission of CMS production jobs from a central location, run on remote locations, and return results – Job Specification: IMPALA – Replication: GDMP – Globus GRAM – Job Scheduling: Condor-G and local systems