Hans Wenzel PMG Meeting Friday Dec. 13th 2002 ”T1 status" Hans Wenzel Fermilab  near term vision for: data management and farm configuration management.

Slides:

Advertisements

Similar presentations

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Advertisements

CHEPREO Tier-3 Center Achievements. FIU Tier-3 Center Tier-3 Centers in the CMS computing model –Primarily employed in support of local CMS physics community.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

Experiences Deploying Xrootd at RAL Chris Brew (RAL)

1 A Basic R&D for an Analysis Framework Distributed on Wide Area Network Hiroshi Sakamoto International Center for Elementary Particle Physics (ICEPP),

October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.

Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.

Hans Wenzel CMS week, CERN September 2002 ”Facility for muon analysis at FNAL" Hans Wenzel Fermilab I.What is available at FNAL right now II.What will.

UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.

D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.

Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall

Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.

Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.

Distribution After Release Tool Natalia Ratnikova.

SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.

Lower Storage projects Alexander Moibenko 02/19/2003.

GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh

21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Spending Plans and Schedule Jae Yu July 26, 2002.

16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.

9 Systems Analysis and Design in a Changing World, Fourth Edition.

Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.

US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November.

Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.

GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

RAL Site report John Gordon ITD October 1999

1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Automatic server registration and burn-in framework HEPIX’13 28.

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.

Oct 24, 2002 Michael Ernst, Fermilab DRM for Tier1 and Tier2 centers Michael Ernst Fermilab February 3, 2003.

Storage and Data Movement at FNAL D. Petravick CHEP 2003.

Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.

Development of the CMS Databases and Interfaces for CMS Experiment: Current Status and Future Plans D.A Oleinik, A.Sh. Petrosyan, R.N.Semenov, I.A. Filozova,

US-CMS T2 Centers US-CMS Tier 2 Report Patricia McBride Fermilab GDB Meeting August 31, 2007 Triumf - Vancouver.

Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

Office of Science U.S. Department of Energy NERSC Site Report HEPiX October 20, 2003 TRIUMF.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

Hans Wenzel Second Large Scale Cluster Workshop October ”The CMS Tier 1 Computing Center at Fermilab" Hans Wenzel Fermilab  The big picture.

STORAGE ARCHITECTURE/ MASTER): Where IP and FC Storage Fit in Your Enterprise Randy Kerns Senior Partner The Evaluator Group.

CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.

Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,

1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.

Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.

IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &

Universita’ di Torino and INFN – Torino

Distributed Systems and Concurrency: Distributed Systems

Presentation transcript:

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 ”T1 status" Hans Wenzel Fermilab  near term vision for: data management and farm configuration management  Implementing the new WBS  work performed in the last few weeks.  Near term plans

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 What do we expect from dCache?  making a multi-terabyte server farm look like one coherent and homogeneous storage system.  Rate adaptation between the application and the tertiary storage resources.  Optimized usage of expensive tape robot systems and drives by coordinated read and write requests. Use dccp command instead of encp!  No explicit staging is necessary to access the data (but prestaging possible and in some cases desirable).  The data access method is unique independent of where the data resides.  High performance and fault tolerant transport protocol between applications and data servers  Fault tolerant, no specialized servers which can cause severe downtime when crashing.  Can be accessed directly from your application (e.g. root TDCacheFile class).  Can be used as scalable file store without HSM.

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 dCache GridFTP dCache GridFTP Catalog GridFTP CASTOR FNAL CERN Florida, … Catalog Tier 2 with dCache ENSTOREENSTORE

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 Global data management w/SRB … GridFTP Local MCAT dCache GridFTP Local MCAT dCache GridFTP Local MCAT dCache Global Catalog Site A Site B Site N

Hans Wenzel PMG Meeting Friday Dec. 13th 2002

Node Front-end Node Network attached Disk dCache ENSTORE Web servers Db servers Dynamic partitioning and config. the farm

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 Descriptive information to configure a node Compute Node Kickstart file IO ServerWeb Server Appliances Collection of all possible software packages (AKA Distribution) RPMs NPACI ROCKS

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 Implementing the new WBS  Use new WBS to track our efforts.  Consequence of new WBS closer cooperation between T1 and T2 sites.  Established biweekly meetings to discuss R&D projects ( ).  Fermilab and SD collaborate on deploying and evaluating ROCKS. GOAL standard Rocks distribution for all US centers. ( , , )

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 Implementing the new WBS (II)  SD and FNAL also collaborate on SRB and dCache ( ) dCache: combine disks on farm nodes, gridFTP as transport mechanism, SRB bookkeeping. dCache deployed at SD. Will package dCache.  Caltech evaluation and optimization of disk IO of the IDE based disk servers -> next step sustain 200MB/sec WAN speed ( )  Need even more collaboration next year

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 What was happening at FCC  Chris Brew (1/2) from SCS joint the CMS team in SCS (Joe Kaiser lead, Chris Brew, Merina Albert). ISD Michael Zalokar, Jon Bakken (dCache), Igor Mandrichenko (FBSNG). CMS: Hans Wenzel, Michael Ernst, Natalia Ratnikova  65 dual AMD athlon nodes are installed and commissioned. Acceptance period (30 days) finished right before Thanksgiving. (many delays, same vendor won bids for CDF, CMS, D0 purchases)  Monitored and evaluated daily, swapped broken parts. Achieved 98.3 % up time ( )  Upgraded monitoring software (temperature and fan speeds) ( )  We used the farm for hcal test beam production.  We used the farm to test the 3 TB aztera disk system ( benchmarking suite, ). Needed constant tuning and upgrading in the end we managed to double performance.

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 What was happening at FCC II  Used farm in tests of dCache system. Developed a root based benchmarking suite testing the entire data path from Application to mass storage. ( )  Used farm to install and configure ROCKS. ( )  7 dCache linux nodes: hardware has been upgraded (SCSI system disk). System upgraded to Kernel , XFS File system. System was usable during the upgrades ( ). dCache software and configuration upgraded.  Installed web servers electronic logbook. ( )  Work on transition from RH 6 to RH 7.  Missed milestone: Test of interactive farm based Analysis prototype ( ). Slipped by 2 -3 weeks.

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 The new farm and the dCache system

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 Linux dCache node: Important Kernel to avoid memory management problems XFS file system: we found it’s the only one that scales still Delivers performance when File system is full Add SCSI system disk Need server specific Linux distribution!!! Need to tweak Many parameters to Achieve optimum Performance-> feed back to vendors Next generation: Xeon based, PCIX bus Large capacity disks Dual System disk (raid1)

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 First results with dCache system These tests were done before the hardware and configuration upgrade. The average file size is ~1 GByte the reads are equally distributed over all read pools. Reads with dccp from popcrn nodes into /dev/null # of concurrent reads (40 farm nodes) Aggregate input speed (sustained over hours) Mbyte/sec Mbyte/sec Mbyte/sec READS 2.7 MB/sec per process

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 First results with dCache system The following was done with 2 write pools. Using an Root Application utilizing TDCacheFile to write an Event tree into dCache. Only had three farm nodes available so we are network limited. # of concurrent writes (3 farm nodes) Aggregate output speed (sustained over hours) MByte/sec 5.2 MB/sec process WRITES

Hans Wenzel PMG Meeting Friday Dec. 13th pounds

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 AZTERA RESULTS

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 Last weeks power outage (>1.5 days)  Made good use of the outage combining several maintenance projects. (knew it was coming sometime)  Completed remote power automation project, which required complete rewiring and racking of nodes.)  Addressed safety issues: Safer power distribution.  Better connectivity between cms computing and central switch.  Installation and configuration of private network for ROCKS.  Still causes delays…, things break ….  Many tasks were not listed in WBS (need to review more often)

Hans Wenzel PMG Meeting Friday Dec. 13th 2002 Plans for the near term future  Continue Evaluating disk systems (zambeel, dCache, dfarm, panasas, exanet…..). Procure a network attached system at the begin of next year? But lot happening on the market.  Upgrade of dCache system to follow the technology (serial ide, PCIX, xeon, memory bandwidth….)  Configure and deploy farm based interactive/batch user computing.  Continue evaluating ROCKS, SRB,…., Need more manpower for  Follow the marching orders given in the new WBS, refine WBS as we go along.