From DØ To ATLAS Jae Yu ATLAS Grid Test-Bed Workshop Apr. 4-6, 2002, UTA Introduction DØ-Grid & DØRACE DØ Progress UTA DØGrid Activities Conclusions.

Slides:

Advertisements

Similar presentations

4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.

Advertisements

McFarm: first attempt to build a practical, large scale distributed HEP computing cluster using Globus technology Anand Balasubramanian Karthik Gopalratnam.

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.

K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.

The D0 Monte Carlo Challenge Gregory E. Graham University of Maryland (for the D0 Collaboration) February 8, 2000 CHEP 2000.

GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.

DØ IB Meeting, Nov. 8, 2001 J. Yu, UTA, Remote Analysis Status Remote Analysis Coordination Computing hardware is rather inexpensive –CPU and storage media.

High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.

The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.

Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.

Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.

BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.

November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.

3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

Nick Brook Current status Future Collaboration Plans Future UK plans.

1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.

Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.

DØ RAC Working Group Report Progress Definition of an RAC Services provided by an RAC Requirements of RAC Pilot RAC program Open Issues DØRACE Meeting.

GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.

DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005.

DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu.

International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.

21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.

Status of UTA IAC + RAC Jae Yu 3 rd DØSAR Workshop Apr. 7 – 9, 2004 Louisiana Tech. University.

Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.

Status of Grid-enabled UTA McFarm software Tomasz Wlodek University of the Great State of TX At Arlington.

Spending Plans and Schedule Jae Yu July 26, 2002.

26SEP03 2 nd SAR Workshop Oklahoma University Dick Greenwood Louisiana Tech University LaTech IAC Site Report.

D0RACE: Testbed Session Lee Lueking D0 Remote Analysis Workshop February 12, 2002.

1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5 November 2001.

DØSAR a Regional Grid within DØ Jae Yu Univ. of Texas, Arlington THEGrid Workshop July 8 – 9, 2004 Univ. of Texas at Arlington.

GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

UTA Site Report DØrace Workshop February 11, 2002.

DØRACE Workshop Jae Yu Feb.11, 2002 Fermilab Introduction DØRACE Progress Workshop Goals Arrangements.

UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.

Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.

CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.

Remote Institute Tasks Frank Filthaut 11 February 2002  Monte Carlo production  Algorithm development  Alignment, calibration  Data analysis  Data.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

Feb. 13, 2002DØRAM Proposal DØCPB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Partial Workshop ResultsPartial.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

DØRACE Workshop Agenda Feb. 11, 2002 (Morning – 1West) 8:30 Registration at High Rise next to 1 West Conference room 9:00 Welcome (John W.) - Session Chair:

DØRACE Workshop Agenda Feb. 11, 2002 (Morning – 1West) 8:30 Registration at High Rise next to 1 West Conference room 9:00 Welcome Womersley 9:10-9:30 DØRACE.

UTA Site Report Jae Yu UTA Site Report 7 th DOSAR Workshop Louisiana State University Apr. 2 – 3, 2009 Jae Yu Univ. of Texas, Arlington.

McFarm Improvements and Re-processing Integration D. Meyer for The UTA Team DØ SAR Workshop Oklahoma University 9/26 - 9/27/2003

Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?

July 10, 2002DØRACE Status Report, Jae Yu DØ IB Meeting, OU Workshop 1 DØ RACE Status Report Introduction Software Distribution (DØRACE Setup) Hardware.

PROTO-GRID Status of Grid-enabled UTA McFarm software Tomasz Wlodek University of the Great State of Texas At Arlington.

Status report NIKHEF Willem van Leeuwen February 11, 2002 DØRACE.

U.S. ATLAS Grid Production Experience

Belle II Physics Analysis Center at TIFR

News & Action Items Automatic Release Ready notification system is in place and working (Thanks Alan!! ) Needs refinements!!! New Fermi RH CD-Rom released.

UM D0RACE STATION Status Report Chunhui Han June 20, 2002

DØRACE Workshop Summary

Status report NIKHEF Willem van Leeuwen February 11, 2002 DØRACE.

Remote Analysis Coordination

DØ Internal Computing Review

DØ RAC Working Group Report

Proposal for a DØ Remote Analysis Model (DØRAM)

Presentation transcript:

From DØ To ATLAS Jae Yu ATLAS Grid Test-Bed Workshop Apr. 4-6, 2002, UTA Introduction DØ-Grid & DØRACE DØ Progress UTA DØGrid Activities Conclusions

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 2 Introduction DØ has been taking data since Mar. 1, 2001 –Accumulated over 20pb -1 of collider data –Current maximum output rate is about 20Hz, averaging at 10Hz –This will improve to average rate of 25-30Hz with 50% duty factor, eventually reaching to about 75Hz in Run IIb –Resulting Run IIa (2 years) data sample is about 400TB including reco. (1x10 9 events) –Run IIb will follow (4 year run)  Data to increase to 3-5PB (~10 10 ) DØ Institutions are scattered in 19 countries –About 40% European collaborators  Natural demand for remote analysis The data size poses serious issues –Time taken for re-processing of data (10sec/event with 40 specint95) –Easy access through the given network bandwidth DØ Management recognized the issue and, Nov. 2001, created –A position for remote analysis coordination –Restructured DØ Computing team to include a D0Grid team (3 FTEs)

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 3 DØRACE DØ Remote Analysis Coordination Efforts In existence to accomplish: –Setting up and maintaining remote analysis environment –Promote institutional contribution remotely –Allow remote institutions to participate in data analysis –To prepare for the future of data analysis More efficient and faster delivery of multi-PB data More efficient sharing of processing resources Prepare for possible massive re-processing and MC production to expedite the process Expeditious physics analyses –Maintain self-sustained support amongst the remote institutions to construct a broader bases of knowledge –Sociological issues of HEP people at the home institutions and within the field Integrate various remote analysis effort into one working piece Primary goal is allow individual desktop users to make significant contribution without being at the lab

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 4 Difficulties –Having hard time setting up initially Lack of updated documentation Rather complicated set up procedure Lack of experience  No forum to share experiences –OS version differences (RH6.2 vs 7.1), let alone OS –Most the established sites have easier time updating releases –Network problems affecting successful completion of large size releases (4GB) takes a couple of hours (SA) –No specific responsible persons to ask questions –Availability of all necessary software via UPS/UPD –Time difference between continents affecting efficiencies From a Survey

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 5 Progress in DØGrid and DØRACE ? Document on DØRACE home page ( ) –To ease the barrier over the difficulties in initial set up –Updated and simplified instructions for set up available on the web  Many institutions have participated in refining the instruction –Tools to make DØ software download and installation made available Release Ready notification system activated –Success is defined by institutions depending on what they do Build Error log and dependency tree utility available Change of release procedure to alleviate unstable network dependencies SAM (Sequential Access model Metadata catalogue) station set up for data access  15 institutions transfer files back and forth Script for automatic download, installation, component verification, and reinstallation in prep.

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 6 Categorized remote analysis system set up by the functionality –Desk top only –A modest analysis server –Linux installation –UPS/UPD Installation and deployment –External package installation via UPS/UPD CERNLIB Kai-lib Root –Download and Install a DØ release Tar-ball for ease of initial set up? Use of existing utilities for latest release download –Installation of cvs –Code development –KAI C++ compiler –SAM station setup DØRACE Strategy Phase I Rootuple Analysis Phase 0 Preparation Phase II Executables Phase III Code Dev. Phase IV Data Delivery

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 7 Progressive

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 8 Where are we? DØRACE is entering into the next stage –The compilation and running –Active code development –Propagation of setup to all institutions Instructions seem to take their shape well Need to maintain and to keep them up to date Support to help problems people encounter –35 institutions (about 50%) ready for code development –15 institutions for data transfer DØGRID TestBed formed at the Feb. workshop –Some 10 institutions participating –UTA’s own Mark Sosebee is taking charge Globus job submission testing in progress  UTA participates in LSF testing. DØ has SAM in place for data delivery and cataloguing services  Deep into DØ analysis fabric, the D0Grid must be established on SAM We will also need to establish regional analysis centers

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 9 Central Analysis Center (CAC) Regional Analysis Centers RAC …. Institutional Analysis Centers Desktop Analysis Stations DAS …. DAS …. Store & Process 10~20% of All Data IAC …. IAC …. IAC Normal Interaction Communication Path Occasional Interaction Communication Path Proposed DØRAM Architecture

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 10 Regional Analysis Centers A few geographically selected sites that satisfy requirements Provide almost the same level of service as FNAL to a few institutional analysis centers Analyses carried out within the regional center –Store 10~20% of statistically random data permanently –Most the analyses performed on these samples with the regional network –Refine the analyses using the smaller but unbiased data set –When the entire data set is needed  Underlying Grid architecture provide access to remaining data set

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 11 Regional Analysis Center Requirements Become a Mini-CAC Sufficient computing infrastructure –Large bandwidth (gagibit or better) –Sufficient Storage Space to hold 10~20% of data permanently and expandable to accommodate data increase >30TB just for Run IIa RAW data –Sufficient CPU resources to provide regional or Institutional analysis requests and reprocessing Geographically located to avoid unnecessary network traffic overlap Software Distribution and Support –Mirror copy of CVS database for synchronized update between RAC’s and CAC –Keep the relevant copies of data bases –Act as SAM service station

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 12 What has UTA Team been doing? UTA DØGrid team consists of: –2 faculty (Kaushik and Jae) –2 senior scientists (Tomasz and Mark) –1 External software designer –3 CSE students Computing Resources for MC Farm activities –25 Dual P-III 850MHz machines for HEP MC Farm –5 Dual P-III 850MHz machines on CSE farm –ACS 16 Dual P-III 900MHz machines under LSF Installed Globus 2.0  on a few machines –Completed “Hello world!” testing a few weeks ago –A working version of MC Farm bookkeeper (See the demo) from an ATLAS machine to our HEP Farm machine through Globus  Proof of Principle

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 13 Short and Long Term Goals Plan to further the job-submission –Submit MC jobs from remote nodes to our HEP farm for more complicated chain of job executions –Build a prototype high level interface for job submission and distribution for DØ analysis jobs (Next 6 months) Store the reconstructed files ion local cache Submit an analysis job from one of the local machines to Condor Put the output into either the requestor’s area or the cache Reduced output analysis job processing from a remote node through Globus Plan to become a DØRAC –Submitted a $1M MRI for RAC hardware purchase in Jan.

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 14 Generator job (Pythia, Isajet, …) DØgstar (DØ GEANT) DØsim (Detector response ) Underlying events (prepared in advance) DØreco (reconstruction) RecoA (root tuple) SAM storage in FNAL DØ Monte Carlo Production Chain

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 15 Distribute daemon Bookkeeper Lock manager Execute daemon Monitordaemon Gather daemon Job archive Cache disk SAM Remote machine UTA MC farm software daemons and their control WWW Rootdaemon

Job Life Cycle Distribute queue Distributer daemon Execute queue Executer daemon Error queue Gatherer queue Cache, SAM, archive Gatherer daemon

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 17 Production bookeeping During a running period the farms produce few thousand jobs Some jobs crash, need to be restarted Users must be kept up to date about their MC requests status (waiting? Running? Done?) A dedicated bookkeeping software is needed

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 18 The new, Globus-enabled, bookkeeper HEP farm CSE farm A machine from Atlas farm is running bookkeeper Globus-job-run Grid-ftp www server Globus domain

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 19 The new bookeeper One dedicated bookeeper machine can serve any number of MC production farms running mcfarm software The communication with remote centers is done using Globus-tools only No need to install bookeeper on every farm – makes life simpler if many farms participate!

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 20 DØ  ATLAS RAC Computing Room A 2000ft 2 computing room in the new building Specifications given to the designers –Multi gigabit fiber network –Power and cooling sufficient for 250 Processing PCs 100 IDE-RAID arrays  Providing over 50TB cache 50’ 40’ 10’x12’ Operations area 10’x3’ 30’x3’ Glass Window Double Door 3’ 5’

Apr. 5, 2002Jae Yu, From DØ to ATLAS ATLAS TestBed Workshop 21 Conclusions The DØRACE is making significant progress –Preparation for software release and distribution –About 50% of the institutions are ready for code development and 20% with data delivery –DØGrid TestBed being organized –DØGrid software being developed based on SAM UTA Has been the only US institution with massive MC generation capabilities  Sufficient expertise in MC job distribution and resource management UTA is playing a leading role in DØGrid Activities –Major player in DØGrid software design and development –Successful in MC bookkeeper job submission and report DØGrid should be the testing platform for ATLAS Grid –What we do for DØ should be applicable to ATLAS with minimal efforts –What you do for ATLAS should be easily transferable to DØ and tested with actual data taking –The Hardware for DØRAC should be the foundation for ATLAS use