Zhongliang Ren 12 June 2006 WLCG Tier2 Workshop at CERN

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Current Monte Carlo calculation activities in ATLAS (ATLAS Data Challenges) Oxana Smirnova LCG/ATLAS, Lund University SWEGRID Seminar (April 9, 2003, Uppsala)
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
The first year of LHC physics analysis using the GRID: Prospects from ATLAS Davide Costanzo University of Sheffield
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
ATLAS Database Operations Invited talk at the XXI International Symposium on Nuclear Electronics & Computing Varna, Bulgaria, September 2007 Alexandre.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.
Zprávy z ATLAS SW Week March 2004 Seminář ATLAS SW CZ Duben 2004 Jiří Chudoba FzÚ AV CR.
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
The ATLAS Grid Progress Roger Jones Lancaster University GridPP CM QMUL, 28 June 2006.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
Post-DC2/Rome Production Kaushik De, Mark Sosebee University of Texas at Arlington U.S. Grid Phone Meeting July 13, 2005.
The CMS CERN Analysis Facility (CAF) Peter Kreuzer (RWTH Aachen) - Stephen Gowdy (CERN), Jose Afonso Sanches (UERJ Brazil) on behalf.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
ATLAS Grid Computing Rob Gardner University of Chicago ICFA Workshop on HEP Networking, Grid, and Digital Divide Issues for Global e-Science THE CENTER.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
David Stickland CMS Core Software and Computing
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
The Worldwide LHC Computing Grid Introduction & Housekeeping Collaboration Workshop, Jan 2007.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Dario Barberis: ATLAS DB S&C Week – 3 December Oracle/Frontier and CondDB Consolidation Dario Barberis Genoa University/INFN.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.
ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN.
Daniele Bonacorsi Andrea Sciabà
Computing Operations Roadmap
University of Texas At Arlington Louisiana Tech University
Database Replication and Monitoring
Virtualization and Clouds ATLAS position
Data Challenge with the Grid in ATLAS
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
Computing Report ATLAS Bern
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Simulation use cases for T2 in ALICE
Workshop Summary Dirk Duellmann.
ALICE Computing Upgrade Predrag Buncic
ILD Ichinoseki Meeting
US ATLAS Physics & Computing
R. Graciani for LHCb Mumbay, Feb 2006
LHC Data Analysis using a worldwide computing grid
ATLAS DC2 & Continuous production
The ATLAS Computing Model
Development of LHCb Computing Model F Harris
The LHCb Computing Data Challenge DC06
Presentation transcript:

Zhongliang Ren 12 June 2006 WLCG Tier2 Workshop at CERN 21/09/04 ATLAS MC Use Cases Zhongliang Ren 12 June 2006 WLCG Tier2 Workshop at CERN

Z. Ren WLCG T2 Workshop CERN Table of Contents ATLAS computing TDR and roles of Tier2s Current MC productions Basic requirements at Tier2s for ATLAS MC production ATLAS job requirement and site configurations Missing tools and grid middleware functionality Data services at Tier2s Near future plan & schedule 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Z. Ren WLCG T2 Workshop CERN Operation parameters, types and sizes of event data and processing times Raw event data rate from online DAQ Hz 200 Operation time seconds/day 50000 days/year Operation time (2007) 50 Event statistics events/day 107 Event statistics (from 2008 onwards) events/year 2·109 Raw Data Size MB 1.6 ESD Size 0.5 AOD Size KB 100 TAG Size 1 Simulated Data Size 2.0 Simulated ESD Size Time for Reconstruction kSI2k-sec/event 15 Time for Simulation Time for Analysis 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Z. Ren WLCG T2 Workshop CERN Roles of Tier2 From the ATLAS computing TDR: Monte Carlo event simulation Full chain MC production including event generation, simulation, digitization, event pileup and reconstruction MC production rate equivalent to >20% of ATLAS raw event rate Total Tier2 CPU resources should cover the MC production requirements MC data to be stored at Tier1 to which the Tier2 is associated User physics analyses (See separate talk by Dietrich Liko) Hosting (part of) AOD (200TB/year) & TAG (2TB/year) data For physics working groups, sub-groups AOD & TAG data based Chaotic, competitive with 1,000+ users (resource sharing) ! Special Tier2s dedicated for detector calibrations 12/06/2006 Z. Ren WLCG T2 Workshop CERN

MC Production: Requirements & Deliverables ATLAS C-TDR assumes: 200 Hz and 320 MB/sec 50,000 second effective data-taking, 10 million events per day MC data production equivalent to 20% of raw data rate 2 million MC events per day Assuming ATLAS data-taking at full efficiency: 17.28 million events per day ~3.5 million MC events per day MC production deliverables in 2007-2008: World-wide distributed MC production at T2s 50 events per job for physics events, 100 events/job for single particle events Central MC productions ~100K jobs/day Additional capacity for calibration and alignment (dedicated T2s) Global operation with 100+ sites Needs 7x24 stable services 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Current MC Productions Major objective has been the Computing System Commissioning (CSC) in 2006 Several goals for on-going MC productions: Started in Oct. 2005 as a running-in computing operations toward LHC startup in 2007 Testing & commissioning of the infrastructure and production systems Offline software validations to catch the rare bugs at 1/103 event level Provide data for ATLAS detector commissioning and physics analysis With 150 different physics samples (of which 61 single particles) Used also Tier1 resources for MC production Being performed outside of LCG SC3/SC4 activities 12/06/2006 Z. Ren WLCG T2 Workshop CERN

The Infrastructure and Production System Globablly distributed MC production using three Grids: LCG/EGEE, OSG and NG 4 different production systems: LCG-Lexor LCG-CondorG OSG-PANDA NG – Dulcinea DQ2 Distributed Data Management (DDM) system Integrated in OSG-PANDA, LCG-CondorG & LCG-Lexor To be integrated in NG DDM operations ATLAS VO-boxes and DQ2 servers ready at CERN and 10 T1s. FTS channel configurations done. Need to configure FTS channels to the T2s! 12/06/2006 Z. Ren WLCG T2 Workshop CERN

The MC Production Workflow The typical workflow for distributed MC production: ATLAS offline software release and pacman distribution kit This has been driving the production schedule Validation of the release: A multiple step process Pre-release nightly build and RTT (Run Time Tester) testing Physics validation team signs off Site and software installation validation group (central computing operations) Installation of the release Physics samples from ATLAS physics coordination Sample A (~250k events) validation on all grids Sample B (~1M events) production Sample C (~10M events) Job definitions Validation production with Sample A and B Large scale production for CSC, physics, CTB, cosmic commissioning, etc. 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Z. Ren WLCG T2 Workshop CERN 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Z. Ren WLCG T2 Workshop CERN 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Z. Ren WLCG T2 Workshop CERN 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Worldwide Distributed Productions

Summary of Current MC Productions Finished many cycles of validation runs: ~100 bug reports filed so far Found many exotic software bugs at 1/103 event level in ATLAS offline software as well as in Geant4 Provided MC data for ATLAS physics analysis and detector commissioning Reached an average MC production rate of ~2 million events per week, still to ramp up by a factor of 10 according to ATLAS computing model Thanks to all Grids (LCG/EGEE, OSG and NG) and sites! Direct contacts between ATLAS and sites have been set up Need synchronized information channels with LCG! Requires increasing resources at T2s and a more stable, production quality infrastructure towards the end of 2006 ?! 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Basic requirements at Tier2s for ATLAS MC Production General requirements for ATLAS offline software installation: A working WLCG MW installation Scientific Linux 3.0.x or fully compatible OS A shared file system among all WNs ~20-30GB free disk space (one release needs ~10GB) The glibc-headers and libX11.a (should be dropped!) installed in the WNs A supportive site administrator Installation is done via a Pacman kit Managed by a central operation team. Different people take care of installations on different grids NG has mixed installations of the centrally-built pacman kit and recompiled RPMs for clusters with different OS 12/06/2006 Z. Ren WLCG T2 Workshop CERN

ATLAS Job Requirement and Site Configurations Different jobs require different minimum sizes of main memory to avoid frequent paging: > 600MB for a simulation job > 1.0 GB for a reconstruction job > 1.3 GB for a pileup job Current LCG MW does not support passing individual job parameters to the local batch system! Only information published is GlueHostMainMemoryRAMSize via the Glue schema Problem with LCG sites having different main memory PCs! Complicated cluster and batch system configurations (double CPU PC or with Hyper-Threading) 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Missing Tools and Grid MW Functionality Need tools for resource allocation and job priority setting among the VOs and the individual users VOMS and GPBox (Grid Policy Box)? Need better monitoring and accounting tools for resource usage Require new middleware functionality to be able to pass individual job parameters down to the local batch system 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Z. Ren WLCG T2 Workshop CERN Data Services at Tier2s MC data produced at Tier2s are transferred to and stored at the T1s they are associated to T2s Provide SRM SE for both raw and MC AOD data (disk resident) LCG Tier2 sites: MC data transfer is done with the DQ2 services at the T1 DQ2 installations at Tier2s are not needed for now OSG Tier2 sites: DQ2 installation is needed at every T2 in the PANDA production system NG Tier2 sites: DQ2 not yet used! Data services at T2s are complicated with DB replications which still need to be worked out in the context of LCG 3D project 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Z. Ren WLCG T2 Workshop CERN Event Data at Tier2s Not only from MC production T2 keeps AOD/TAG data for user analysis T2 with enough storage can hold complete AOD data AOD of real data: 200 TB/year AOD of MC data: 40 TB/year ATLAS C-TDR now says 1/3 of AOD for each T2 Complete TAG data 2 TB/year Can be relational TAG database and/or TAG data files Relational database is presumed to be MySQL at Tier2s 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Non-Event Data at Tier2s Geometry DB Master in OracleDB@CERN, replicas distributed as SQLite files (file-based “database”) Calibrations and Conditions DB Mixed file- and database-resident data (MySQL at Tier2s) Some conditions will be needed for analysis, some calibrations for simulation: no good volume estimates yet Small compared to TAG database 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Data File Distribution Using DQ2 Event data and calibration “datasets” arrive by subscription in distributed data management (DDM) system (DQ2) Subscribing to a dataset implicitly adds data transfer requests to DQ2 transfer queue DDM decides the site(s) from which files in dataset will be transferred Tier2s will in general get their data from the associated Tier1 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Database-Resident Data Distribution Assumption is that all database-resident data has a single master copy in OracleDB@CERN Tier0 -> Tier1 transfer using Oracle-specific “Streams” replication Tier1 -> Tier2 crosses technologies: Oracle->MySQL Tools for TAG and conditions database replication from Oracle to MySQL have been demonstrated, but have not been used in production to date Production model not well understood yet LCG 3D project is prototyping mechanisms for distributed database deployment, replication, and update operations 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Z. Ren WLCG T2 Workshop CERN Web Caching of DB Data Possibility of using web caching technologies (Squid) for access to database-resident data is under investigation FNAL-developed FroNtier project (Squid-cache-based) is being evaluated in LCG 3D project Before the DB replication issue is settled, it is not very clear what will be needed at Tier2s for data services! The jobs will mostly remain with the ATLAS central computing operation team, and be kept minimum for the Tier2 services! 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Near Future Plan and Schedule Things urgently needed in the production system DQ2/DDM fully integrated in the NG production system Require increased CPU resources with enough main memory for ATLAS MC production jobs Need VOMS & GP-Box for resource allocation, job priority setting, etc. Continue ramping-up the production rate: ~50K jobs/day by summer 2006 ~100K jobs/day by end of 2006 up to ~1M jobs/day (including distributed analysis jobs) by LHC startup Continue validation production, Perform CSC, Calibration Data Challenge production, etc. in 2nd half 2006 Performed combined general dress rehearsal in Spring 2007 using MC produced data (ref. Next slide) 12/06/2006 Z. Ren WLCG T2 Workshop CERN

The Dress Rehearsal in Spring 2007 Generate an amount of MC data close to what is expected in 2007. MC production scale of N*107 events ? Mix and filter events at MC generator level to get correct physics mixture as expected at HLT output Run G4 and trigger simulation and produce ByteStream data to mimic the raw data Send raw data to Point 1, pass through HLT nodes and SFO, write out events into streams, closing files at boundary of luminosity blocks Send events from Point 1 to Tier0, Perform calibration & alignment at Tier0, Run reconstruction at Tier0 Distribute ESD, AOD, TAG data to Tier1s and Tier2s Perform distributed analysis at Tier2s 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Backup Slides

ATLAS Production System (2006) ProdDB (jobs) Tasks DDM (Data Management) DQ2 Eowyn super super super super LSF exe super Python T0MS Python Python Python Python EGEE exe EGEE exe NG exe OSG exe Lexor Lexor-CG Dulcinea PanDA EGEE NorduGrid OSG

LCG World View Currently 51 sites with ATLAS SW installed +9300 CPUs (shared with other VOs) 12/06/2006 Z. Ren WLCG T2 Workshop CERN

OSG World View Currently ~50 sites, ~5000 CPU’s ATLAS dedicated ~1000 CPU’s 12/06/2006 Z. Ren WLCG T2 Workshop CERN

Z. Ren WLCG T2 Workshop CERN NorduGrid World View 13 sites, 789 CPUs available for ATLAS now 12/06/2006 Z. Ren WLCG T2 Workshop CERN