Presentation is loading. Please wait.

Presentation is loading. Please wait.

U.S. ATLAS Computing Facilities (Overview) Bruce G. Gibbard Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National.

Similar presentations


Presentation on theme: "U.S. ATLAS Computing Facilities (Overview) Bruce G. Gibbard Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National."— Presentation transcript:

1 U.S. ATLAS Computing Facilities (Overview) Bruce G. Gibbard Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November 27-30, 2001

2 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 2 Outline  US ATLAS Computing Facilities Definition  Mission  Architecture & Elements  Motivation for Revision of the Computing Facilities Plan  Schedule  Computing Model & Associated Requirements  Technology Evolution  Tier 1 Budgetary Guidance  Tier 1 Personnel, Capacity, & Cost Profiles for New Facilities Plan

3 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 3 US ATLAS Computing Facilities Mission  Facilities procured, installed and operated  …to meet U.S. “MOU” obligations to ATLAS  Direct IT support (Monte Carlo generation, for example)  Support for detector construction, testing, and calibration  Support for software development and testing  …to enable effective participation by US physicists in the ATLAS physics program !  Direct access to and analysis of physics data sets  Simulation, re-reconstruction, and reorganization of data as required to complete such analyses

4 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 4 Element of US ATLAS Computing Facilities  A Hierarchy of Grid Connected Distributed Resources Including:  Tier 1 Facility Located at Brookhaven – Rich Baker / Bruce Gibbard  Operational at < 0.5% level  5 Permanent Tier 2 Facilities ( to be Selected in April ’03 )  2 Prototype Tier 2’s selected earlier this year and now active  Indiana University – Rob Gardner  Boston University – Jim Shank  Tier 3 / Institutional Facilities  Several currently active; most candidate to become Tier 2’s  Univ. of California at Berkeley, Univ. of Michigan, Univ. of Oklahoma, Univ. of Texas at Arlington, Argonne Nat. Lab.  Distribute IT Infrastructure – Rob Gardner  US ATLAS Persistent Grid Testbed – Ed May  HEP Networking – Shawn McKee  Coupled to Grid Projects with designated liaisons  PPDG – Torre Wenaus  GriPhyN – Rob Gardner  iVDGL – Rob Gardner  EU Data Grid – Craig Tull

5 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 5 Tier 2’s  Mission of Tier 2’s for US ATLAS  A primary resource for simulation  Empower individual institutions and small groups to do relatively autonomous analysis using high performance regional networks and more directly accessible and locally managed resources  Prototype Tier 2’s were selected based on their ability to contribute rapidly to Grid architecture development  Goal in future Tier 2 selections will be to leverage particularly strong institutional resources of value to ATLAS  Aggregate of the 5 Tier 2’s is expected to be comparable to Tier 1 in CPU and disk capacity available for analysis

6 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 6 US ATLAS Persistent Grid Testbed Calren Esnet, Abilene, Nton Esnet, Mren UC Berkeley LBNL-NERSC Esne t NPACI, Abilene Brookhaven National Laboratory Indiana University Boston University Argonne National Laboratory U Michigan Oklahoma University Abilene Prototype Tier 2s HPSS sites University of Texas At Arlington

7 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 7 Evolution of US ATLAS Facilities Plan  In Responds to Changes or Potential Changes in  Schedule  Computing Model & Requirements  Technology  Budgetary Guidance

8 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 8 Changes in Schedule  LHC start-up projected to be a year later  2005/2006  2006/2007  ATLAS Data Challenges (DC’s) have, so far, stayed fixed  DC0 – Nov/Dec 2001 – 10 5 events  Software continuity test  DC1 – Feb/Jul 2002 – 10 7 events  ~1% scale test  DC2 – Jan/Sep 2003 – 10 8 events  ~10% scale test  A serious functionality & capacity exercise  A high level of US ATLAS facilities participation is deemed very important

9 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 9 Computing Model and Requirements  Nominal model was:  At Tier 0 (CERN)  Raw  ESD/AOD/TAG pass done, result shipped to Tier 1’s  At Tier 1’s (six anticipated for ATLAS)  TAG/AOD/~25% of ESD on Disk, Tertiary storage for remainder of ESD  Selection passes through complete ESD ~monthly  Analysis of TAG/AOD/selected ESD/etc. (n-tuples) on disk for analysis pass by ~200 users within 4 hours  At Tier 2’s (five in U.S.)  Data access primarily via Tier 1 (to control load on CERN and transatlantic link)  Support ~50 users as above but frequent access ESD on disk at Tier 1 likely  Serious limitations are  A month is a long time to wait for the next selection pass  Only 25% of ESD is available for event navigating from TAG/AOD during analysis  The 25% of ESD on disk will rarely have been consistently selected (once a month) and will be continuously rotating, altering the accessible subset of data

10 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 10 Changes in Computing Model and Requirements (2)  Underlying problem:  Selection pass and analysis event navigation access to ESD is sparse  Estimated to be ~1 out of 100 events per analysis  ESD is on tape rather than on disk  Tape is a sequential medium  Must access 100 times more data then needed  Tape is expensive per unit of I/O bandwidth  As much as 10 times that of disk  Thus penalty in access cost relative to disk may be a factor of ~1000  Solution:  Get all ESD on disk  Methods for accomplishing this are:  Buy more disk at Tier 1 – most straight forward  Unify/coordinate use of existing disk across multiple Tier 1’s – more economical  Some combination of above – compromise as necessitated by available funding

11 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 11 “2007” Capacities for U.S. Tier 1 Options  “3 Tier 1” Model (Complete ESD found on disk of U.S. plus 2 other Tier 1’s)  Highly dependent on the performance of other Tier 1’s and the Grid middleware and network (transatlantic) used to connect to them  “Standalone” Model (Complete ESD on disk of US Tier 1)  While avoiding above dependencies, is more expensive

12 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 12 Changes in Technology  No dramatic new technologies  Previously assumed technologies are tracking Moore’s Law well  Recent price performance points from RHIC Computing Facility  CPU: IBM procurement - $33/SPECint95  310 Dual 1 GHz Pentium III nodes @ 97.2 SPECint95/Node  Delivered Aug 2001  $1M fully racked including cluster management hardware & software  Disk: OSSI/LSI procurement - $27k/TByte  33 Usable TB of high availability Fibre Channel RAID 5 @ 1400 MBytes/sec  Delivered Sept 2001  $887k including SAN switch  Strategy is to project, somewhat conservatively, from these points for facilities design and costing  Actually used 20 month rather than the observed <18 month price/performance halving time for disk and cpu

13 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 13 Changes in Budgetary Assumptions (2)  Assumed Funding Profiles (At Year $K)  For revise LHC startup schedule, new profile is better  For ATLAS DC 2 which stayed fixed in ’03, new profile is worse  Hardware capacity goals of DC 2 will not be met  Personnel intensive facility development may be as much as 1 year behind  Hope is that another DC will be added allowing validation of a more nearly fully developed Tier 1 and US ATLAS facilities Grid

14 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 14 Profiles for Standalone Disk Option  Much higher functionality (than other options) and, given new stretched out LHC schedule, within budget guidance  Fractions in revised profiles in table below are of a final system which has nearly 2.5 times the capacity of that discussed last year

15 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 15 Associated Labor Profile

16 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 16 Summary Tier 1 Cost Profile (At Year $K)  Current plan violated guidance by $370k in FY ’04, but this is a year of some flexibility in guidance  Strict adherence to FY ’04 guidance would …  reduce facility capacity from 3% to 1.5% or staff by 2 FTE’s

17 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 17 Tier 1 Capacity Profile

18 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 18 Tier 1 Cost Profiles

19 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 19 Standalone Disk Model Benefits  All ESD, AOD, and TAG data on local disk  Enables analysis specific 24 hour selection passes (versus one month aggregated passes) – faster, better tuned, more consistent selection  Allows navigation for individual events (to all processed, but not Raw, data) without recourse to tape and associated delay – faster more detailed analysis of larger consistently selected data sets  Avoids contention between analyses over ESD disk space and the need for complex algorithms to optimize management of that space – better result with less effort  While prepared to serve appropriate levels of data access to other Tier 1’s, US will not in general be unduly sensitive to the performance of other Tier 1’s or intervening network (transatlantic) and middleware – improved system reliability, availability, robustness and performance

20 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 20 Tier 2 Issues  The high availability of the complete ESD set on disk at the Tier 1 and the associated increased frequency of ESD selection passes will, for connected Tier 2’s (and Tier 3’s ), lead to …  More analysis activity – (Increasing CPU & Disk utilization)  More frequent analysis passes on  More and larger usable TAG, AOD and ESD subsets  More network traffic into the site from the Tier 1 – (Increasing WAN utilization)  Selection results  Event navigation into the full disk resident ESD  As in the case of the Tier 1, an additional year of funding before turn-on and the increased effectiveness of “year later” funding contribute to satisfying these increased needs within or near the integrated out year (’05-’07) budget guidance  The delay of some ’06 funding to ’07 is required for a better match of profiles

21 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 21 Tier 2 Distribution Of Hardware Cost

22 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 22 Tier 1 Distribution Of Hardware Cost

23 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 23 FY 2007 Capacity Comparison of Models

24 27-30 November, 2001 B. Gibbard Review of US LHC Software and Computing Projects 24 Conclusions  Standalone disk model  A dramatic improvement over previous tape based mode – Functionality & Performance  A significant improvement over multi-Tier 1 disk model – Performance, Reliability & Robustness  Respects funding guidance in model sensitive out-years  If costs are higher or funding lower than expected, a graceful fallback is to access some of the data on disks at other Tier 1’s  Adiabaticly move toward multi-Tier 1 model


Download ppt "U.S. ATLAS Computing Facilities (Overview) Bruce G. Gibbard Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National."

Similar presentations


Ads by Google