Are We Ready for the LHC Data? Roger Jones Lancaster University PPD Xmas Bash RAL, 19/12/06.

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Current Monte Carlo calculation activities in ATLAS (ATLAS Data Challenges) Oxana Smirnova LCG/ATLAS, Lund University SWEGRID Seminar (April 9, 2003, Uppsala)
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
ATLAS-Specific Activity in GridPP EDG Integration LCG Integration Metadata.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
Event Metadata Records as a Testbed for Scalable Data Mining David Malon, Peter van Gemmeren (Argonne National Laboratory) At a data rate of 200 hertz,
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Requirements Review – July 21, Requirements for CMS Patricia McBride July 21, 2005.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Alignment Strategy for ATLAS: Detector Description and Database Issues
The first year of LHC physics analysis using the GRID: Prospects from ATLAS Davide Costanzo University of Sheffield
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Atlas CAP Closeout Thanks to all the presenters for excellent and frank presentations Thanks to all the presenters for excellent and frank presentations.
ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.
ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007.
The ATLAS Grid Progress Roger Jones Lancaster University GridPP CM QMUL, 28 June 2006.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
TRT Offline Software DOE Visit, August 21 st 2008 Outline: oTRT Commissioning oTRT Offline Software Activities oTRT Alignment oTRT Efficiency and Noise.
Post-DC2/Rome Production Kaushik De, Mark Sosebee University of Texas at Arlington U.S. Grid Phone Meeting July 13, 2005.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Computing for Alice at GSI (Proposal) (Marian Ivanov)
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
ATLAS Computing Requirements LHCC - 19 March ATLAS Computing Requirements for 2007 and beyond.
The ATLAS Computing Model Roger Jones Lancaster University CHEP07 Victoria B.C. Sept
The ATLAS Computing & Analysis Model Roger Jones Lancaster University GDB BNL, Long Island, 6/9/2006.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
The Worldwide LHC Computing Grid Introduction & Housekeeping Collaboration Workshop, Jan 2007.
14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Initial Planning towards The Full Dress Rehearsal Michael Ernst.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
Wouter Verkerke, NIKHEF Status of Software & Computing Wouter Verkerke.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
ADC Operations Shifts J. Yu Guido Negri, Alexey Sedov, Armen Vartapetian and Alden Stradling coordination, ADCoS coordination and DAST coordination.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
Computing Operations Roadmap
Database Replication and Monitoring
Data Challenge with the Grid in ATLAS
ALICE analysis preservation
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
Zhongliang Ren 12 June 2006 WLCG Tier2 Workshop at CERN
US ATLAS Physics & Computing
ATLAS DC2 & Continuous production
The ATLAS Computing Model
The LHCb Computing Data Challenge DC06
Presentation transcript:

Are We Ready for the LHC Data? Roger Jones Lancaster University PPD Xmas Bash RAL, 19/12/06

RWL Jones 19 December 2006 RAL PPD 2 Are We Ready? No! Thank You But Seriously….

RWL Jones 19 December 2006 RAL PPD 3 The high-level goals of the Computing System Commissioning operation during Autumn 06 - Spring 07The high-level goals of the Computing System Commissioning operation during Autumn 06 - Spring 07 A running-in of continuous operation not a stand-alone challenge Main aim of CSC is to test the software and computing infrastructure that we will need at the beginning of 2007:Main aim of CSC is to test the software and computing infrastructure that we will need at the beginning of 2007: Calibration and alignment procedures and conditions DB Full trigger chain Event reconstruction and data distribution Distributed access to the data for analysis 60 M events have already been produced; new production of 10M events will be done from now until the end of the year.60 M events have already been produced; new production of 10M events will be done from now until the end of the year. At the end of 2006 we will have a working and operational system, ready to take data with cosmic rays at increasing ratesAt the end of 2006 we will have a working and operational system, ready to take data with cosmic rays at increasing rates Getting There - Computing System Commissioning (06/07)

RWL Jones 19 December 2006 RAL PPD 4 Facilities at CERN Tier-0:Tier-0: Prompt first pass processing on express/calibration & physics streams with old calibrations - calibration, monitoring Calibrations tasks on prompt data hours later, process full physics data streams with reasonable calibrations CERN Analysis FacilityCERN Analysis Facility Access to ESD and RAW/calibration data on demand Essential for early calibration Detector optimisation/algorithmic development

RWL Jones 19 December 2006 RAL PPD 5 What is Done  Large scale system tests in the last few weeks have shown the Event-Filter  Tier 0 transfer tests are under control  Various ‘Tier 0’ exercises have demonstrated the first pass processing can be done if the calibrations are in place  Missing elements The rapid calibration process The adequate monitoring and validation

RWL Jones 19 December 2006 RAL PPD 6 Data flows and operations can be maintained for ~ week Tier-0 Internal Transfers (Oct)

RWL Jones 19 December 2006 RAL PPD 7 Fully simulate ~ 20M events (mainly SM processes: Z  ll, QCD di-jets, etc.) with “realistic” detector “Realistic”  1) As installed in the pit : already-installed detector components positioned in the software according to survey measurements 2) Mis-calibrated (e.g. calo cells, R-t relations) and mis-aligned (e.g. SCT modules, muon chambers); include also chamber/module deformations, wire sagging, HV imperfections, etc. Use the above samples and calibration/alignment algorithms to calibrate and align the detector and recover the nominal (“TDR”) performance. Useful also to understand the trigger performance in more realistic conditions. Includes exercise of (distributed) infrastructure: 3D Condition DB, bookkeeping, etc. Scheduled for Spring 2007; needs ATLAS Release 13 (February 2007) The Calibration Data Challenge (CDC)

RWL Jones 19 December 2006 RAL PPD 8  Obtain final alignment and calibration constants  Compare performance of realistic “as-installed” detector after calibration and alignment to nominal (TDR) performance  Understand many systematic effects (material, B-field), test trigger robustness, etc  Learn how to do analyses w/o a-priori information (exact geometry, etc.) Geometry of realistic “as-installed” detector G4-simulation of ~ 20M events (SM processes e.g. Z  ll) Reconstruction pass N (Release 13, Oct. 06) Analysis Calib/align constants pass N Condition DataBase Calib/align constants from pass N-1 Pass 1 uses nominal calibration, alignment, material Large part of it in Release /12.0.x (being validated now) Schematic of the Calibration Data Challenge

RWL Jones 19 December 2006 RAL PPD 9 Tier 1 Activities Tier-1:Tier-1:  Receiving RAW an first pass processed data from CERN  Redistribute to Tier 2s  Reprocess 1-2 months after arrival with better calibrations  Reprocess all resident RAW at year end with improved calibration and software

RWL Jones 19 December 2006 RAL PPD 10 Put in place monitoring system allowing sites to see their rates (disk/tape areas), data assignments, errors in the last hours, per file, dataset, …Put in place monitoring system allowing sites to see their rates (disk/tape areas), data assignments, errors in the last hours, per file, dataset, … FTS channels in place between T0 and T1 and now progressing between T1 and T2sFTS channels in place between T0 and T1 and now progressing between T1 and T2s By ‘pressure’ of regional contacts Start of the exercise marked by deployment of new DQ2 version (LCG and OSG sites)Start of the exercise marked by deployment of new DQ2 version (LCG and OSG sites) Hopefully this is last major new release for near future Many improvements to the handling of FTS requests Tier-2s participate on a “voluntary basis”.Tier-2s participate on a “voluntary basis”. T0-T1-T2 Scaling test (October 2006)

RWL Jones 19 December 2006 RAL PPD 11 Data transfer (October 2006)

RWL Jones 19 December 2006 RAL PPD 12 Tier 1 Reprocessing There is a feeling that this is far down the critical pathThere is a feeling that this is far down the critical path Not needed until 2008 Can clone much of the Tier 0 system I do not agreeI do not agree Complex data movement between T1s and out to T2s Staging out of RAW data is different site-by-site Recall for reprocessing will be different Must handle time blocks Best handled by site pulling off a central task queue when ready Implies local effort On a positive note, the 3D conditions database tests have worked well, with RAL one of 3 Tier 1s fully validatedOn a positive note, the 3D conditions database tests have worked well, with RAL one of 3 Tier 1s fully validated

RWL Jones 19 December 2006 RAL PPD 13 Analysis computing model Analysis model broken into two Tier 1: Scheduled central production of augmented AOD, tuples & TAG collections from ESD  Derived files moved to other T1s and to T2s ! The framework for this is still to be Tier 2: On-demand user analysis of augmented AOD streams, tuples, new selections etc and individual user simulation and CPU-bound tasks matching the official MC production  Modest job traffic between T2s  Tier 2 files are not private, but may be for small sub- groups in physics/detector groups Limited individual space, copy to Tier3s

RWL Jones 19 December 2006 RAL PPD 14 Optimised Access RAW, ESD and AOD will be streamed to optimise accessRAW, ESD and AOD will be streamed to optimise access The selection and direct access to individual events is via a TAG databaseThe selection and direct access to individual events is via a TAG database TAG is a keyed list of variables/event Overhead of file opens is acceptable in many scenarios Works very well with pre-streamed data Two rolesTwo roles Direct access to event in file via pointer Data collection definition function Two formats, file and databaseTwo formats, file and database Now believe large queries require full database Multi-TB relational database Restricts it to Tier1s and large Tier2s/CAF File-based TAG allows direct access to events in files (pointers) Ordinary Tier2s hold file-based primary TAG corresponding to locally- held datasets

RWL Jones 19 December 2006 RAL PPD 15 Tier 2/3 Activities ~30 Tier 2 Centers distributed worldwide Monte Carlo Simulation, producing ESD, AOD, ESD, AOD  Tier 1 centers~30 Tier 2 Centers distributed worldwide Monte Carlo Simulation, producing ESD, AOD, ESD, AOD  Tier 1 centers  Simulation  ESD, AOD, ESD, AOD  Tier 1 centers On demand user physics analysis of shared datasets Limited success Limited access to ESD and RAW data sets Tier 3 Centers distributed worldwideTier 3 Centers distributed worldwide Physics analysis Data private and local - summary datasets

RWL Jones 19 December 2006 RAL PPD 16 On-demand Analysis Restricted Tier 2s and CAFRestricted Tier 2s and CAF Can specialise some Tier 2s for some groups ALL Tier 2s are for ATLAS-wide usage Most ATLAS Tier 2 data should be ‘placed’ with lifetime ~ monthsMost ATLAS Tier 2 data should be ‘placed’ with lifetime ~ months Job must go to the data This means Tier 2 bandwidth is vastly lower than if you pull data to the job Back navigation requires AOD/ESD to be co-located Role and group based quotas are essentialRole and group based quotas are essential Quotas to be determined per group not per user Data SelectionData Selection Over small samples with Tier-2 file-based TAG and AMI dataset selector TAG queries over larger samples by batch job to database TAG at Tier- 1s/large Tier 2s What data?What data? Group-derived formats Subsets of ESD and RAW Pre-selected or selected via a Big Train run by working group No back-navigation between sites, formats should be co-located

Alessandra Forti Gridpp12 17 Simulation has been the basis of all tests so farSimulation has been the basis of all tests so far We can produce over 10M events fully simulated and reconstructed per month This needs to double in 07Q1 Partly needs new kit, but also higher efficiency Tier 2s and Tier 3 all contribute Data management needs work (see below) We have limited analysis activityWe have limited analysis activity Most users pull data to a selected local resource over WAN (bad) Better users make copy to CLOSE CE over LAN (better) Some have tried streamed access using rfio Problems with root version for dCache Does not work for DPM - incompatible root plug-in Fix promised ‘in a month’ for approximately 9 months Data ManagementData Management This is not yet fit for purpose More development effort is needed Needs work for bulk transfers, partly using middleware better, partly missing functionality The operations effort is being disentangled Every site needs tuning and retuning Site SRM stability has been an issue Tier 2 Preparation Tier 2 Preparation

RWL Jones 19 December 2006 RAL PPD 18 User Tools GANGA/pAthena starting to be used for Distributed AnalysisGANGA/pAthena starting to be used for Distributed Analysis Users always improve things Priority queues being requested Hard to track frequent software releases DDM user tools exist but rudimentaryDDM user tools exist but rudimentary Vital to manage requests Also important for traffic to T3s

RWL Jones 19 December 2006 RAL PPD 19  Generate O(10 7 ) evts: few days of data taking, ~1 pb -1 at L=10 31 cm -2 s -1  Filter events at MC generator level to get physics spectrum expected at HLT output  Pass events through G4 simulation (realistic “as installed” detector geometry)  Mix events from various physics channels to reproduce HLT physics output  Run LVL1 simulation (flag mode)  Produce byte streams  emulate the raw data  Send raw data to Point 1, pass through HLT nodes (flag mode) and SFO, write out events by streams, closing files at boundary of luminosity blocks.  Send events from Point 1 to Tier0  Perform calibration & alignment at Tier0 (also outside ?)  Run reconstruction at Tier0 (and maybe Tier1s ?)  produce ESD, AOD, TAGs  Distribute ESD, AOD, TAGs to Tier1s and Tier2s  Perform distributed analysis (possibly at Tier2s) using TAGs  MCTruth propagated down to ESD only (no truth in AOD or TAGs) A complete exercise of the full chain from trigger to (distributed) analysis, to be performed in July 2007 “The Dress rehearsal”

RWL Jones 19 December 2006 RAL PPD 20 Whole New Operations Model Requires an overall Grid Operations Co-ordinatorRequires an overall Grid Operations Co-ordinator Kors Bos from January (tbc) Requires Data Placement and resource co-ordinationRequires Data Placement and resource co-ordination TBD Requires Data Movement Operations Co-ordinator and TeamRequires Data Movement Operations Co-ordinator and Team Alexei Kimentov Requires full production shifts and tasksRequires full production shifts and tasks Prepared, included in formal requirements Start from July?

Alessandra Forti Gridpp12 21 Tier 1 DiskTier 1 Disk The shortage of Tier 1 disk has been crippling both operations and the Computing System Commissioning/SC4 exercises The first Tier 1-Tier 2 transfer tests using the DDM tools failed because of absence of required T1 disk and required Current tests are ongoing 6/7 tier 2s receiving data from RAL T1, 3/7 shipping it back Lots of pain to clear available disk Daily operations stopped because of missing disk space Tier 2 capacity does not help for operations. Need to separate production and user disk space/allocations The large CSC operations are about to start Even the production will be limited until the year end Local Comments

Alessandra Forti Gridpp12 22 Tier 1 & other UK Grid operationsTier 1 & other UK Grid operations We now have weekly operations meetings for the Tier 1 RAL also attend ATLAS weekly production meeting Much improved communication with T1 Invitation to DTEAM meetings harder to attend - need bigger ATLAS production group Also ATLAS services and operationsAlso ATLAS services and operations UK operations shifts are being started This includes shifts for the Run Time testing etc Also data movement effort We need to provide ~5FTE 2 for data movement 1 for ATLAS production 1 for user support 1 for RTT Operations issues Operations issues

RWL Jones 19 December 2006 RAL PPD 23 Evolution There is a growing Tier 2 capacity shortfall with time We need to be careful to avoid wasted event copies

RWL Jones 19 December 2006 RAL PPD 24 Closing Comment We desperately need to exercise the analysis modelWe desperately need to exercise the analysis model With real distributed analysis With streamed (‘impure’) datasets With the data realistically placed (several copies available, not being pulled locally)