Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
The RHIC-ATLAS Computing Facility at BNL HEPIX – Edinburgh May 24-28, 2004 Tony Chan RHIC Computing Facility Brookhaven National Laboratory.
Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Liverpool HEP – Site Report May 2007 John Bland, Robert Fay.
CBPF J. Magnin LAFEX-CBPF. Outline What is the GRID ? Why GRID at CBPF ? What are our needs ? Status of GRID at CBPF.
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
Network Upgrades { } Networks Projects Briefing 12/17/03 Phil DeMar; Donna Lamore Data Comm. Group Leaders.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Questionaire answers D. Petravick P. Demar FNAL. 7/14/05 DLP -- GDB2 FNAL/T1 issues In interpreting the T0/T1 document how do the T1s foresee to connect.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
Current Job Components Information Technology Department Network Systems Administration Telecommunications Database Design and Administration.
IT Infrastructure Chap 1: Definition
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
Outline IT Organization SciComp Update CNI Update
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
FNAL System Patching Design Jack Schmidt, Al Lilianstrom, Andy Romero, Troy Dawson, Connie Sieh (Fermi National Accelerator Laboratory) Introduction FNAL.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
April 25, 2001HEPiX/HEPNT FERMI SITE REPORT Lisa Giacchetti.
Site-Wide Backup Briefing Ray Pasetes Core Support Services April 16, 2004.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November.
Sep 02 IPP Canada Remote Computing Plans Pekka K. Sinervo Department of Physics University of Toronto 4 Sep IPP Overview 2 Local Computing 3 Network.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Gareth Smith RAL PPD RAL PPD Site Report. Gareth Smith RAL PPD RAL Particle Physics Department Overview About 90 staff (plus ~25 visitors) Desktops mainly.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Storage and Data Movement at FNAL D. Petravick CHEP 2003.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Lisa Giacchetti AFS: What is everyone doing? LISA GIACCHETTI Operating Systems Support.
Computing Division FY03 Budget and budget outlook for FY04 + CDF International Finance Committee April 4, 2003 Vicky White Head, Computing Division.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
Status: Central Storage Services CD/LSC/CSI/CSG June 26, 2007.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Run - II Networks Run-II Computing Review 9/13/04 Phil DeMar Networks Section Head.
Patrick Gartung 1 CMS 101 Mar 2007 Introduction to the User Analysis Facility (UAF) Patrick Gartung - Fermilab.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
Jefferson Lab Site Report Sandy Philpott HEPiX Fall 07 Genome Sequencing Center Washington University at St. Louis.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
Gene Oleynik, Head of Data Storage and Caching,
Статус ГРИД-кластера ИЯФ СО РАН.
LHC Data Analysis using a worldwide computing grid
Lee Lueking D0RACE January 17, 2002
The LHCb Computing Data Challenge DC06
Presentation transcript:

Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division

CD mission statement The Computing Divisions mission is to play a full part in the mission of the laboratory and in particular: To proudly develop, innovate, and support excellent and forefront computing solutions and services, recognizing the essential role of cooperation and respect in all interactions between ourselves and with the people and organizations that we work with and serve.

How we are organized

We participate in all areas

Production system capacities

Growth in farms usage

Growth in farms density

Projected growth of computers

Projected power growth

Computer rooms Provide space, power & cooling for central computers Problem: increasing luminosity –~ 2600 computers in FCC –Expect to add ~1,000 systems/year –FCC has run out of power & cooling, cannot add utility capacity New Muon Lab –256 systems for Lattice Gauge theory –CDF early buys of 160 systems CDF existing systems from FCC –Developing plan for another room Wide Band –Long term phased plan FY04 – 08 –FY04/05 build: 2,880 computers (~$3M) –Tape robot room in FY05 –FY06/07: ~3,000 computers

Computer rooms

Storage and data movement 1.72 PB of data in ATL –Ingest of ~100 TB/mo Many 10s of TB fed to analysis programs each day Recent work: –Parameterizing storage systems for SRM Apply to SAM Apply more generally –VO notions in storage systems

FNAL Starlight dark fiber project FNAL dark fiber to Starlight –Completion: Mid-June, 2004 –Initial DWDM configuration: One 10 Gb/s (LAN_PHY) channel Two 1 Gb/s (OC48) channels Intended uses of link –WAN network R&D projects –Overflow for production traffic: ESnet link to remain production network link –Redundant offsite path

General network improvements Core network upgrades –Switch/router (Catalyst 6500s) supervisors upgraded: 720 Gb/s switching fabric (Sup720s); provides 40Gb/s per slot –Initial deployment of 10 Gb/s backbone links 1000B-T support expanded –Ubiquitous on computer room floors: New farms acquisitions supported on gigabit ethernet ports –Initial deployment in a few office areas

Network security improvements Mandatory node registration for network access –Hotel-like temporary registration utility for visitors –System vulnerability scan is part of the process Automated network scan blocker deployed –Based on quasi-real time network flow data analysis –Blocks outbound & inbound scans VPN service deployed

Central services –Spam tagging in place X-Spam-Flag: YES –Capacity upgrades for gateways, imapservers, virus scanning –Redundant load sharing AFS –Completely on OpenAFS –SAN for backend storage –TiBS Backup system –DOE-funded SBIR for performance investigations Windows –Two-tier patching system for Windows 1 st tier under control of OU (patchlink) 2 nd tier domain-wide (SUS) 0 Sasser infections post- implementation

Central services -- backups Site-wide backup plan is moving forward –SpectraLogic T950-5 –8 SAIT-1 drives –Initial 450 tape capacity for 7TB pilot project Plan for modular expansion to over 200 TB

Computer security Missed by Linux rootkit epidemic –but no theoretical reason for immunity Experimenting w/ AFS cross-cell authentication –w/ Kerberos 5 authentication –subtle ramifications DHCP registration process –includes security scan, does not (yet) deny access –a few VIPs have been tapped during meetings Vigorous self-scanning program –based on nessus –maintain database of results –look especially for critical vulnerabilities (& deny access)

Run II – D0 D0 reprocessed 600M events in fall 2003 –using grid style tools, 100M of those event processed offsite at 5 other facilities –Farm production capacity is roughly 25M events per week –MC production capacity is 1 M events per week –about 1B events/week on the analysis systems. Linux SAM station on a 2 TB fileserver to serve the new analysis nodes –next step in the plan to reduce D0min –station has been extremely performant, expanding the Linux SAM cache –station typically delivers about 15 TB of data and 550M events per week. Rolled out a MC production system that has grid-style job submission –JIM component of SAM-Grid Torque (sPBS) is in use on the most recent analysis nodes –has been much more robust than PBS. Linux fileservers are being used as "project" space –physics group managed storage with high access patterns –good results.

MINOS & BTeV status MINOS –data taking in early 2005 –using standard tools Fermi Linux General-purpose farms AFS Oracle enstore & dcache ROOT BTeV –preparations for CD-1 review by DOE included review of online (but not offline) computing novel feature is that much of the Level2/3 trigger software will be part of the offline reconstruction software

US-CMS computing DC04 Data Challenge and the preparation for the computing TDR –preparation for the Physics TDR (P-TDR) –roll out of the LCG Grid service and federating it with the U.S. facilities Develop the required Grid and Facilities infrastructure –increase the facility capacity through equipment upgrades –commission Grid capabilities through Grid2003 and LCG-1 efforts –develop and integrate required functionalities and services Increase the capability of User Analysis Facility –improve how a physicists would use facilities and software –facilities and environment improvements –software releases, documentation, web presence etc

US-CMS computing – Tier Worker Nodes (Dual 1 U Xeon Servers and Dual 1U Athlon) –240 CPUs for Production (174 kSI2000) –32 CPUs for Analysis (26 kSI2000) All systems purchased in 2003 are connected over gigabit 37 TB of Disk Storage –24TB in Production for Mass Storage Disk Cache In 2003 we switched to SATA disks in external enclosures connected over fiber channel Only marginally more expensive than 3ware based systems, and much easier to administrate. –5TB of User Analysis Space Highly available, high performance, backed-up space –8TB Production Space 70TB of Mass Storage Space –Limited by tape purchases and not silo space

US-CMS computing

US-CMS computing – DC03 & GRID 2003 Over 72K CPU-hours used in a week 100 TB of data transferred across Grid3 sites Peak numbers of jobs approaching 900 Average numbers during the daytime over 500

US-CMS computing – DC04

1 st LHC magnet leaving FNAL for CERN

And our science has shown up in some unusual journals… Her sneakers squeaked as she walked down the halls where Lederman had walked. The 7 th floor of the high-rise was where she did her work, and she found her way to the small, functional desk in the back of the pen.