HEP Data Grid for DØ and Its Regional Grid, DØ SAR

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

Physics with SAM-Grid Stefan Stonjek University of Oxford 6 th GridPP Meeting 30 th January 2003 Coseners House.
4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
HEP Prospects, J. Yu LEARN Strategy Meeting Prospects on Texas High Energy Physics Network Needs LEARN Strategy Meeting University of Texas at El Paso.
DOSAR SURA Cyberinfrastructure Workshop: "Grid Application Planning & Implementation" Texas Advanced Computing Center (TACC) University of Texas at Austin.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
TechFair ‘05 University of Arlington November 16, 2005.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
UTA Site Report Jae Yu UTA Site Report 2 nd DOSAR Workshop UTA Mar. 30 – Mar. 31, 2006 Jae Yu Univ. of Texas, Arlington.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
DØSAR, State of the Organization Jae Yu DOSAR, Its State of Organization 7th DØSAR (3 rd DOSAR) Workshop University of Oklahoma Sept. 21 – 22, 2006 Jae.
Status of DØ Computing at UTA Introduction The UTA – DØ Grid team DØ Monte Carlo Production The DØ Grid Computing –DØRAC –DØSAR –DØGrid Software Development.
DOSAR Workshop Sept , 2007 J. Cochran 1 The State of DOSAR Outline What Exactly is DOSAR (for the new folks) Brief History Goals, Accomplishments,
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
Installation Sites. Active SAM-Grid Sites  The following sites are actively used for different types of jobs Monte Carlo – production of simulated events.
The Texas High Energy Grid (THEGrid) A Proposal to Build a Cooperative Data and Computing Grid for High Energy Physics and Astrophysics in Texas Alan Sill,
DØ Data Handling Operational Experience GridPP8 Sep 22-23, 2003 Rod Walker Imperial College London Computing Architecture Operational Statistics Challenges.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.
DØ RAC Working Group Report Progress Definition of an RAC Services provided by an RAC Requirements of RAC Pilot RAC program Open Issues DØRACE Meeting.
DOSAR Workshop at Sao Paulo Dick Greenwood What’s Next for DOSAR? Dick Greenwood Louisiana Tech University 1 st DOSAR Workshop at the Sao Paulo, Brazil.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005.
DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu.
International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.
Status of UTA IAC + RAC Jae Yu 3 rd DØSAR Workshop Apr. 7 – 9, 2004 Louisiana Tech. University.
Spending Plans and Schedule Jae Yu July 26, 2002.
26SEP03 2 nd SAR Workshop Oklahoma University Dick Greenwood Louisiana Tech University LaTech IAC Site Report.
D0RACE: Testbed Session Lee Lueking D0 Remote Analysis Workshop February 12, 2002.
1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5 November 2001.
DØSAR a Regional Grid within DØ Jae Yu Univ. of Texas, Arlington THEGrid Workshop July 8 – 9, 2004 Univ. of Texas at Arlington.
ISU DOSAR WORKSHOP Dick Greenwood DOSAR/OSG Statement of Work (SoW) Dick Greenwood Louisiana Tech University April 5, 2007.
From DØ To ATLAS Jae Yu ATLAS Grid Test-Bed Workshop Apr. 4-6, 2002, UTA Introduction DØ-Grid & DØRACE DØ Progress UTA DØGrid Activities Conclusions.
High Energy Physics & Computing Grids TechFair Univ. of Arlington November 10, 2004.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
THEGrid Workshop Jae Yu THEGrid Workshop UTA, July 8 – 9, 2004 Introduction An Example of Existing Activities What do we want to accomplish at the workshop?
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
India in DØ Naba K Mondal Tata Institute, Mumbai.
Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.
Outline  Higgs Particle Searches for Origin of Mass  Grid Computing  A brief Linear Collider Detector R&D  The  The grand conclusion: YOU are the.
DCAF(DeCentralized Analysis Farm) for CDF experiments HAN DaeHee*, KWON Kihwan, OH Youngdo, CHO Kihyeon, KONG Dae Jung, KIM Minsuk, KIM Jieun, MIAN shabeer,
International Workshop on HEP Data Grid Aug 23, 2003, KNU Status of Data Storage, Network, Clustering in SKKU CDF group Intae Yu*, Joong Seok Chae Department.
The State of DOSAR DOSAR VI Workshop at Ole Miss April Dick Greenwood Louisiana Tech University.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Feb. 13, 2002DØRAM Proposal DØCPB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Partial Workshop ResultsPartial.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
UTA Site Report Jae Yu UTA Site Report 7 th DOSAR Workshop Louisiana State University Apr. 2 – 3, 2009 Jae Yu Univ. of Texas, Arlington.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.
OSG All Hands Meeting P. Skubic DOSAR OSG All Hands Meeting March 5-8, 2007 Pat Skubic University of Oklahoma Outline What is DOSAR? History of DOSAR Goals.
6th DOSAR Workshop University Mississippi Apr. 17 – 18, 2008
Monte Carlo Production and Reprocessing at DZero
Southwest Tier 2 Center Status Report
5th DOSAR Workshop Louisiana Tech University Sept. 27 – 28, 2007
Southwest Tier 2.
DOSAR: State of Organization
DØ MC and Data Processing on the Grid
Remote SAM Initiative (RSI) – Proposed Work Plan
News Learned at the OU workshop that we need further split of releases
DØ RAC Working Group Report
Proposal for a DØ Remote Analysis Model (DØRAM)
Presentation transcript:

HEP Data Grid for DØ and Its Regional Grid, DØ SAR Jae Yu Univ. of Texas, Arlington 3rd International Workshop on HEP Data Grid KyungPook National University Aug. 26 – 28, 2004

Outline The problem DØ Data Handling System and Computing Architecture Remote Computing Activities DØ Remote Analysis Model DØSAR and the regional grid Summary 8/27/2004

The Problem The DØ Experiment Has been taking data for the past 3 years and will continue throughout much of the decade The immediacy!!! Current data size close to 1PB and will be over 5 PB by the end Detectors are complicated  Need many people to construct and make them work Collaboration is large and scattered all over the world Allow software development at remote institutions Optimized resource management, job scheduling, and monitoring tools Efficient and transparent data delivery and sharing Use the opportunity of having large data set in furthering grid computing technology Improve computational capability for education Improve quality of life 8/27/2004

DØ and CDF at Fermilab Tevatron World’s Highest Energy proton-anti-proton collider Ecm=1.96 TeV (=6.3x10-7J/p 13M Joules on 10-6m2) Equivalent to the kinetic energy of a 20t truck at a speed 80 mi/hr Chicago  CDF Tevatron p p DØ 8/27/2004

DØ Collaboration 650 Collaborators 78 Institutions 18 Countries 1/3 a problem of an LHC Experiment 8/27/2004

Remote Computing #1: Monitoring of the DØ Experiment Detector Monitoring data sent in real time over the internet 9 am NIKHEF Amsterdam 2 am Fermilab DØ physicists Worldwide, use the internet and monitoring programs to examine collider data in real time and to evaluate detector performance and data quality. They use web tools to report this information back to their colleagues at Fermilab. DØ DØ detector The online monitoring project has been developed by DØ physicists and is coordinated by Michele Sanders, Imperial & Elliot Cheu, Arizona. 8/27/2004

Data Handling System Architecture Regional Centers Central Analysis ClueD0 Remote Farms Central Farm Robotic Storage Raw Data RECO Data RECO MC User Data SAM (Sequential Access to Metadata) catalogs and manages data access The glue holdng it all together and an extremely successful project. 8/27/2004

The SAM Data Handling System Depends on/uses: Central ORACLE database at Fermilab (central metadata repository) ENSTORE mass storage system at Fermilab (central data repository) Provides access to: Various file transfer protocols: bbftp, GridFTP, rcp, dccp.. Several mass storage systems: ENSTORE, HPSS, TSM Provides tools to define datasets: Web-based GUI Command line interface For DØ, has deployed 7 stations at FNAL and > 20 offsite. 8/27/2004

Data Access History via SAM 9x1010 90 billion events consumed by DØ in Run II using SAM Used for primary reconstruction and analysis at FNAL Used for remote simulation, reprocessing and analysis  3 billion served at a remote site 3x109 8/27/2004

International Computing #2 & #3: Worldwide Simulation and Processing UK Remote Data Reconstruction sites NIKHEF Karlsruhe GridKA Lyon CCIN2P3 Westgrid Canada Fermilab DØ Experiment Prague Lancaster Remote Simulation sites Tata Institute Texas Sao Paulo Michigan St. Imperial Michigan Indiana Kansas SAM data transfer Reconstructed data Oklahoma Boston Wuppertal Munich Arizona Louisiana Simulation files Worldwide Computing Winter 2004 Partial list of stations for remote data analysis 8/27/2004

Offsite Data Re-processing Successfully reprocessed 550M events, 200 pb-1 At Fermilab: 450 M events At GRIDKA, UK, IN2P3, NIKHEF, WESTGRID: 100M Events , data transfers >20 TB First steps to a grid Remote submission Transfer of tens of TB scale data sets Non-grid based re-processing Grid-based reprocessing planned end of CY04 Transfers exceeding 100 TB 8/27/2004

(Regional Analysis Center) DØ Analysis Systems … CluEDØ ~350 nodes dØmino ~50TB … RAC (Regional Analysis Center) … CAB CLuEDO desktop cluster at DØ administered by DØ collaborators, Batch system (FNBS), local fileservers for analysis. Domino a legacy system Older analyses, file server, single event access Central routing station to offsite. CAB (Central Analysis Backend) at Feynman Computing Center PC/Linux dØmino back-end supplied & administrated by computing division 400 dual 2GHZ nodes, each with 80 GB disk Regional Analysis Centers (RAC) 8/27/2004

DØ Remote Computing History SAM in place: pre-2001 Formed the DØRACE and DØGrid teams: Sept. 2001 DØ Remote Analysis Model Proposed: Nov. 2001 8/27/2004

DØ Remote Analysis Model (DØRAM) Normal Interaction Communication Path Occasional Interaction Central Analysis Center (CAC) DAS …. IAC ... … RAC Fermilab Regional Analysis Centers Data and Resource hub MC Production Data processing Data analysis Institutional Analysis Centers Desktop Analysis Stations 8/27/2004

DØ Remote Computing History SAM in place: pre-2001 Formed the DØRACE and DØGrid teams: Sept. 2001 DØ Remote Analysis Model Proposed: Nov. 2001 Proposal for RAC accepted and endorsed by DØ: June - Aug. 2002 UTA awarded MRI for RAC: June 2002 Prototype RAC established at Karlsruhe: Aug. – Nov. 2002 Formation of DØ Southern Analysis Region: Apr. 2003 DØ Offsite re-processing: Nov. 2003 – Feb. 2004 Activation of 1st US RAC at UTA: Nov. 2003 Formation and activation of DØSAR Grid for MC: Apr. 2004 8/27/2004

DØ Southern Analysis Region (DØSAR) One of the regional grids within the DØGrid Consortium coordinating activities to maximize computing, human and analysis resources Formed around the RAC at UTA Eleven institutes and twelve clusters MC farm clusters Mixture of dedicated and multi-purpose, rack mounted Desktop Condor farm http://www-hep.uta.edu/d0-sar/d0-sar.html 8/27/2004

DØSAR Consortium Second Generation IAC’s First Generation IAC’s Cinvestav, Mexico Universidade Estadual Paulista, Brazil University of Kansas Kansas State University First Generation IAC’s University of Texas at Arlington Louisiana Tech University Langston University University of Oklahoma Tata Institute (India) Each 1st generation institution is paired with a 2nd generation institution to help expedite implementation of D0SAR capabilities Third Generation IAC’s Ole Miss, MS Rice University, TX University of Arizona, Tucson, AZ Both 1st and 2nd generation institutions can then help the 3rd generation institutions implement D0SAR capabilities 8/27/2004

Centralized Deployment Models Started with Lab-centric SAM infrastructure in place, … …transition to hierarchically distributed Model  8/27/2004

DØRAM Implementation UTA has the first US DØRAC Mainz Wuppertal Munich Aachen Bonn GridKa (Karlsruhe) Mexico/Brazil OU/LU UAZ Rice LTU UTA KU KSU Ole Miss UTA has the first US DØRAC DØSAR formed around UTA 8/27/2004

UTA – RAC (DPCC) 84 P4 Xeon 2.4GHz CPU = 202 GHz 5TB of FBC + 3.2TB IDE Internal GFS File system 100 P4 Xeon 2.6GHz CPU = 260 GHz 64TB of IDE RAID + 4TB internal NFS File system Total CPU: 462 GHz Total disk: 76.2TB Total Memory: 168Gbyte Network bandwidth: 68Gb/sec HEP – CSE Joint Project DØ+ATLAS CSE Research 8/27/2004

The tools at DØSAR Sequential Access via Metadata (SAM) Batch Systems Existing and battle tested data replication and cataloging system Batch Systems Condor Three of the DØSAR farms consists of desktop machines under condor PBS Most the dedicated DØSAR farms use this system Grid framework: JIM = Job Inventory Management, PPDG Provide framework for grid operation  Job submission, monitoring, match making and scheduling Built upon Condor-G and globus Interfaced to two job managers runjob: More generic grid-enabled system  1 US + 5 EU MC sites McFarm: 5 US DØSAR sites 8/27/2004

Tevatron Grid Framework (SAMGrid or JIM) UTA 8/27/2004

The tools cnt’d Monte Carlo Farm (McFarm) management  Cloned to other institutions Increased the total number of offsite MC farms by 5 Various Monitoring Software Ganglia resource monitoring Piped to MonaLISA as a VO McFarmGraph: MC Job status monitoring McPerM: Farm performance monitor McQue: Anticipated grid resource occupation monitor DØSAR Grid: Submit requests onto a local machine and the requests gets transferred to a submission site and executed at an execution site 8/27/2004

DØSAR Computing & Human Resources Institutions CPU(GHz) [future] Storage (TB) People Cinvestav 13 1.1 1F Langston 22 1.3 1F+1GA LTU 25+[12] 3.0 1F+1PD+2GA KU 12 2.5 1F+1PD KSU 40 3.5 1F+2GA OU 19+[270] 1.8 + 120(tape) 4F+3PD+2GA Sao Paulo 115+[300] 4.5 2F+Many Tata Institute 78 1.6 1F+1Sys UTA 520 74 2.5F+1sys+1.5PD+3GA Total 844 + [582] 93.3 + 120 (tape) 14.5F+2sys+6.5PD+10GA 8/27/2004

Ganglia Grid Resource Monitoring Operating since Apr. 2003 8/27/2004

Job Status Monitoring: McFarmGraph Operating since Sept. 2003 8/27/2004

Farm Performance Monitor: McPerM Operating since Sept. 2003 Designed, implemented and improved by UTA Students 8/27/2004

Queue Monitor: McQue Prototype in commissioning Number of Jobs % of Total Available CPUs Time from Present hours Anticipated CPU Occupation Jobs in Distribute Queue Prototype in commissioning 8/27/2004

DØSAR Strategy Maximally exploit existing software and utilities to enable as many sites to contribute to the experiment Setup all IAC’s with DØ Software and data analysis environment Install Condor (or PBS) batch control system on desktop farms or dedicated clusters Install McFarm MC Local Production Control Software Produce MC events on IAC machines Enable various monitoring software Install SAMGrid and interface it with McFarm Submit jobs through SAMGrid and monitor them Perform analysis at individual’s desk 8/27/2004

DØSAR Computing & Human Resources Institutions CPU(GHz) [future] Storage (TB) People Cinvestav 13 1.1 1F Langston 22 1.3 1F+1GA LTU 25+[12] 3.0 1F+1PD+2GA KU 12 2.5 1F+1PD KSU 40 3.5 1F+2GA OU 19+[270] 1.8 + 120(tape) 4F+3PD+2GA Sao Paulo 115+[300] 4.5 2F+Many Tata Institute 78 1.6 1F+1Sys UTA 520 74 2.5F+1sys+1.5PD+3GA Total 844 + [582] 93.3 + 120 (tape) 14.5F+2sys+6.5PD+10GA 8/27/2004

DØSARGrid Status DØSAR-Grid Total of seven clusters producing MC events At the 3rd biennial workshop in Louisiana Tech. Univ, April 2004 Five grid-enabled clusters form DØSARGrid for MC production Simulated data production on grid in progress Preparing to add 3 more MC and 2 more grid-enabled sites at the next workshop in Sept. 2004 Investigating to work with the JIM team at Fermilab for further software tasks Large amount of documents and regional expertise in grid computing accumulated in the consortium 8/27/2004

How does current DØSARGrid work? Client Site DØ Grid JDL Sub. Sites Ded. Clst. Desktop. Clst. Exe. Sites SAM Reg. Grids 8/27/2004

DØSAR MC Delivery Stat. Institution Inception NMC (TMB) x106 LTU 6/2003 0.6 LU 7/2003 1.3 OU 4/2003 1.0 Tata, India 3.3 Sao Paulo, Brazil 4/2004 UTA-HEP 1/2003 3.5 UTA–RAC 12/2003 9.0 DØSAR Total As of 8/25/04 18.7 8/27/2004 D0 Grid/Remote Computing April 2004 Joel Snow Langston University

Actual DØ Data Re-processing at UTA Completed and delivered 200M events in July, 2004 8/27/2004

Network Bandwidth Occupation ATLAS DC2 DØ TMBFix Sustained Operation ATLAS DC2 OC12 Upgrade expected at the end of 04 or early 05 8/27/2004

Benefits of Regional Consortium Construct end-to-end service environment in a smaller, manageable scale Train and accumulate local expertise Easier access to help Smaller group to work coherently and closely Easier to share expertise Draw additional resources from variety of funding sources Promote interdisciplinary collaboration Increase intellectual resources: Enable remote participants to be more actively contribute to the collaboration 8/27/2004

Some Successes in Funding at DØSAR Funds from NSF MRI for UTA – RAC: 2002 Construction of the first U.S. university based RAC EPSCoR + University funds for LTU – IAC: 2003 Increase IAC compute resources Human resource for further development Brazilian National Funds for Univ. of Sao Paulo: 2003 Construction of a prototype-RAC for SA Further funding very likely EPSCoR funds for OU & LU – IAC’s: 2004 Compute resources for IAC 8/27/2004

Summary DØGrid is operating in MC production with SAMGrid framework Generic (runjob): 1 U.S + 5 EU sites McFarm: 5 DØSAR site Large amount of offsite documents and expertise accumulated Moving toward grid-based data re-processing and analysis Massive data re-processing in late CY04 Data storage at RACs for the consortium Higher level of complexity Improved infrastructure necessary for end-to-end grid services, especially network bandwidths NLR and other regional network (10Gbps) improvement plans Started working with AMPATH, Oklahoma, Louisiana, Brazilian Consortia (Tentatively named the BOLT Network) for the last mile… Start working with global grid efforts  Allow working on interoperability 8/27/2004