TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost TRIUMF SITE REPORT Corrie Kost Update since Hepix Fall 2005.

Slides:

Advertisements

Similar presentations

1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.

Advertisements

1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL June 2006.

Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.

Applications Area Issues RWL Jones GridPP13 – 5 th June 2005.

The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb

QMUL e-Science Research Cluster Introduction (New) Hardware Performance Software Infrastucture What still needs to be done.

B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.

Metadata Progress GridPP18 20 March 2007 Mike Kenyon.

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.

Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.

HEPiX Edinburgh 28 May 2004 LCG les robertson - cern-it-1 Data Management Service Challenge Scope Networking, file transfer, data management Storage management.

A couple of slides on RAL PPD Chris Brew CCLRC - RAL - SPBU - PPD.

CBPF J. Magnin LAFEX-CBPF. Outline What is the GRID ? Why GRID at CBPF ? What are our needs ? Status of GRID at CBPF.

LHCb Computing Activities in UK Current activities UK GRID activities RICH s/w activities.

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

HEPiX, CASPUR, April 3-7, 2006 – Steve McDonald TRIUMF Steven McDonald & Konstantin Olchanski TRIUMF Network & Computing Services

Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005 TRIUMF SITE REPORT Corrie Kost Update since Hepix Spring 2005.

BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.

December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.

UCL Site Report Ben Waugh HepSysMan, 22 May 2007.

ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.

TRIUMF Site Report for HEPiX/HEPNT, Vancouver, Oct20-24/2003 – Corrie Kost TRIUMF SITE REPORT Corrie Kost Head Scientific Computing.

ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.

USATLAS SC4. 2 ?! …… The same host name for dual NIC dCache door is resolved to different IP addresses depending.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.

CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.

Jamie Shiers February 2004 Assembled from SC4 Workshop presentations + Les’ plenary talk at CHEP The Worldwide LHC Computing Grid Service Experiment Plans.

HEPix April 2006 NIKHEF site report What’s new at NIKHEF’s infrastructure and Ramping up the LCG tier-1 Wim Heubers / NIKHEF (+SARA)

USATLAS SC4. 2 ?! …… The same host name for dual NIC dCache door is resolved to different IP addresses depending.

CERN Database Services for the LHC Computing Grid Maria Girone, CERN.

SC4 Planning Planning for the Initial LCG Service September 2005.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.

Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.

Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.

Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

TRIUMF Site Report for HEPiX, JLAB, October 9-13, 2006 – Corrie Kost TRIUMF SITE REPORT Corrie Kost & Steve McDonald Update since Hepix Spring 2006.

ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon

CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.

LCG Service Challenge: Planning and Milestones

NL Service Challenge Plans

Data Challenge with the Grid in ATLAS

Database Readiness Workshop Intro & Goals

Bernd Panzer-Steindel, CERN/IT

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

R. Graciani for LHCb Mumbay, Feb 2006

LHC Data Analysis using a worldwide computing grid

ATLAS DC2 & Continuous production

Dario Barberis CERN & Genoa University

The LHCb Computing Data Challenge DC06

Presentation transcript:

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost TRIUMF SITE REPORT Corrie Kost Update since Hepix Fall 2005

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Devolving Server Functions Windows print server cluster - 2 Dell PowerEdge SC1425 machines sharing external SCSI disk holding printers data Windows Print Server Windows Domain controller OLD NEW - 2 Dell PowerEdge SC primary & secondary Windows domain controllers NEW

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Update since Hepix Fall 2005

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Update since Hepix Fall 2005 Waiting for 10Gb/sec DWDM XFP

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Update since Hepix Fall km DWDM 64 wavelengths/fibre CH34=193.4THz nm ~ $10kUS each

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Update since Hepix Fall 2005

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Update since Hepix Fall 2005

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Servers / Data Centre GPS TIME TRPRINT CMMS TRWEB TGATE DOCUMENTS CONDORG TRSHARE TRMAIL TRSERV LCG WORKER NODES IBM CLUSTER 1GB TEMP TRSHARE KOPIODOC LCG STORAGE IBM ~ 2TB STORAGE TNT2K3 RH-FC-SL MIRROR TRWINDATA TRSWINAPPS WINPRINT1/2 PROMISE STORAGE

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Update since Hepix Fall 2005 TNT2K3 TRWINDATA TRSWINAPPS PROMISE STORAGE TRPRINT TRWEB TGATE CONDORG TRSHARE TRMAIL TRSERV DOCUMENTS CMMS GPS TIME WINPRINT1 WINPRINT2

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost TRIUMF-CERN ATLAS Lightpath - International Grid Testbed (CA*Net IGT) Equipment Amanda Backup ATLAS Worker nodes: (evaluation units) Blades, Dual/Dual 64-bit 3GHz Xeons 4 GB RAM, 80GB Sata VOBOX: 2GB, 3GHz 64-bit Xeon, 2*160GB SATA LFC: 2GB, 3GHz 64-bit Xeon, 2*160GB SATA FTS: 2GB, 3GHz 64-bit Xeon, 3*73GB SCSI SRM Head node 2GB, 64-bit Opteron 2*232GB RAID1 sc1-sc3 dCache Storage Elements 2GB, 3GHz 64-bit Xeon, 8*232GB RAID5 2 SDLT 160GB drives / 26 Cart 2 SDLT 300GB drives / 26 Cart TIER1 prototype (Service Challenge)

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost ATLAS/CERN TRIUMF

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost SiteDisk-DiskDisk-Tape ASGC10075 TRIUMF50 BNL20075 FNAL20075 NDGF50 PIC60*60 RAL15075 SARA15075 IN2P FZK20075 CNAF20075 Tier0 Tier1 Tests Apr 3 30 Any MB/sec rates below 90% nominal needs explanation and compensation in days following. Maintain rates unattended over Easter weekend (April 14-16) Tape tests April Experiment-driven transfers April * The nominal rate for PIC is 100MB/s, but will be limited by the WAN until ~November

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost ATLAS SC4 Plans – Extracted from Mumbai Workshop 17 Feb/2006 (1) March-April (pre-SC4): 3-4 weeks in for internal Tier-0 tests (Phase 0) April-May (pre-SC4): Tests of distributed operations on a small testbed (the pre-production system) Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Tier-1s (720MB/s + full ESD to BNL) 3 weeks in July: Distributed processing tests (Part 1) 2 weeks in July-August: Distributed analysis tests (Part 1) 3-4 weeks in September-October: Tier-0 test (Phase 2) with data to Tier-2s 3 weeks in October: Distributed processing tests (Part 2) 3-4 weeks in November: Distributed analysis tests (Part 2) (1) Update from Hepix Fall 2005

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Repeated reads on same set of (typically 16) files (at ~ 600MB/sec) – during ~ 150 days ~ 7 PB (total since started ~13PB to March 30 – no reboot for 134 days)

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Repeated reads on same set of (typically 16) files (at ~ 600MB/sec) – during ~ 150 days ~ 7 PB (total since started ~13PB to March 30 – no reboot for 134 days)

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Keeping it Cool Central Computing Room isolation fixed – Combined two 11-Ton air-conditioners to even out load Adding heating coil to improve stability Blades for Atlas! – 30% less heat, 20% less TCO 100 W/sq-ft 200 W/sq-ft 400 W/sq-ft means cooling costs are a significant cost factor Note: Electrical/Cooling costs estimated at $Can150k/yr Water cooled systems for (multicore/multicpu) blade systems?

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Keeping it Cool2 HP offers Modular Cooling System (MCS) Used when rack > 10-15Kw US$30K Chilled (5-10C) water Max load 30Kw/rack (17GPM / 5C 20C air) Water cannot reach servers Door open? - Cold air out front, hot out back Significantly less noise with doors closed HWD 1999x909x1295mm (79x36x51) 513Kg/1130lbs (empty) Not certified for Seismic or Zone 4

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost Amanda Backup at TRIUMF Details by Steve McDonald Thursday ~ 4:30pm

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost End of Presentation Extra Slides on SC4 plans for reference…

TRIUMF Site Report for HEPiX, CASPUR, April 3-7, 2006 – Corrie Kost fts: FTS Server FTS= File Transfer Service homepage: Oracle database used 64-bit Intel Xeon 3 GHz 73 GB SCSI disks (3) 2 GB RAM IBM 4560-SLX Tape Library attached (will have 2 SDLT-II drives attached when they arrive, probably next week) SDLT-II does 300 GB native, 600 compressed. Running SL bit lfc: LFC Server LFC= LCG File Catalog info page: MySQL database used 64-bit Intel Xeon 3 GHz 160 GB SATA disks (2), software raid-1 2 GB RAM Running SL bit vobox: VO Box Virtual Organization Box info page: 64-bit Intel Xeon 3 GHz 160 GB SATA disks (2), software raid-1 2 GB RAM Running SL bit sc1-sc3: dCache Storage Elements 64-bit Intel Xeons 3 GHz 3ware Raid Controller 8x 232 GB disks in H/W Raid-5 giving 1.8 TB storage 2 GB RAM Running SL bit sc4: SRM endpoint, dCache Admin node and Storage Element 64-bit Opteron 246 3ware Raid Controller 2x 232 GB disks in H/W Raid-1 giving 250 GB storage 2 GB RAM Running SL bit IBM 4560-SLX Tape Library attached. We are moving both SDLT-I drives to this unit. SDLT-I does 160 GB native, 300 compressed. Service Challenge Servers Details

Dario Barberis: ATLAS SC4 Plans WLCG SC4 Workshop - Mumbai, 12 February ATLAS SC4 Tests l Complete Tier-0 test nInternal data transfer from Event Filter farm to Castor disk pool, Castor tape, CPU farm nCalibration loop and handling of conditions data Including distribution of conditions data to Tier-1s (and Tier-2s) nTransfer of RAW, ESD, AOD and TAG data to Tier-1s nTransfer of AOD and TAG data to Tier-2s nData and dataset registration in DB (add meta-data information to meta-data DB) l Distributed production nFull simulation chain run at Tier-2s (and Tier-1s) Data distribution to Tier-1s, other Tier-2s and CAF nReprocessing raw data at Tier-1s Data distribution to other Tier-1s, Tier-2s and CAF l Distributed analysis nRandom job submission accessing data at Tier-1s (some) and Tier-2s (mostly) nTests of performance of job submission, distribution and output retrieval

Dario Barberis: ATLAS SC4 Plans WLCG SC4 Workshop - Mumbai, 12 February ATLAS SC4 Plans (1) l Tier-0 data flow tests: nPhase 0: 3-4 weeks in March-April for internal Tier-0 tests Explore limitations of current setup Run real algorithmic code Establish infrastructure for calib/align loop and conditions DB access Study models for event streaming and file merging Get input from SFO simulator placed at Point 1 (ATLAS pit) Implement system monitoring infrastructure nPhase 1: last 3 weeks of June with data distribution to Tier-1s Run integrated data flow tests using the SC4 infrastructure for data distribution Send AODs to (at least) a few Tier-2s Automatic operation for O(1 week) First version of shifters interface tools Treatment of error conditions nPhase 2: 3-4 weeks in September-October Extend data distribution to all (most) Tier-2s Use 3D tools to distribute calibration data l The ATLAS TDAQ Large Scale Test in October-November prevents further Tier-0 tests in 2006… n… but is not incompatible with other distributed operations No external data transfer during this phase(?)

Dario Barberis: ATLAS SC4 Plans WLCG SC4 Workshop - Mumbai, 12 February ATLAS SC4 Plans (2) l ATLAS CSC includes continuous distributed simulation productions: nWe will continue running distributed simulation productions all the time Using all Grid computing resources we have available for ATLAS The aim is to produce ~2M fully simulated (and reconstructed) events/week from April onwards, both for physics users and to build the datasets for later tests lWe can currently manage ~1M events/week; ramping up gradually l SC4: distributed reprocessing tests: nTest of the computing model using the SC4 data management infrastructure Needs file transfer capabilities between Tier-1s and back to CERN CAF Also distribution of conditions data to Tier-1s (3D) Storage management is also an issue nCould use 3 weeks in July and 3 weeks in October l SC4: distributed simulation intensive tests: nOnce reprocessing tests are OK, we can use the same infrastructure to implement our computing model for simulation productions As they would use the same setup both from our ProdSys and the SC4 side nFirst separately, then concurrently

Dario Barberis: ATLAS SC4 Plans WLCG SC4 Workshop - Mumbai, 12 February ATLAS SC4 Plans (3) l Distributed analysis tests: nRandom job submission accessing data at Tier-1s (some) and Tier-2s (mostly) Generate groups of jobs and simulate analysis job submission by users at home sites Direct jobs needing only AODs as input to Tier-2s Direct jobs needing ESDs or RAW as input to Tier-1s Make preferential use of ESD and RAW samples available on disk at Tier-2s Tests of performance of job submission, distribution and output retrieval Test job priority and site policy schemes for many user groups and roles Distributed data and dataset discovery and access through metadata, tags, data catalogues. nNeed same SC4 infrastructure as needed by distributed productions Storage of job outputs for private or group-level analysis may be an issue nTests can be run during Q First a couple of weeks in July-August (after distributed production tests) Then another longer period of 3-4 weeks in November