Presentation is loading. Please wait.

Presentation is loading. Please wait.

US-ATLAS Management Overview John Huth Harvard University Agency Review of LHC Computing Lawrence Berkeley Laboratory January 14-17, 2003.

Similar presentations


Presentation on theme: "US-ATLAS Management Overview John Huth Harvard University Agency Review of LHC Computing Lawrence Berkeley Laboratory January 14-17, 2003."— Presentation transcript:

1 US-ATLAS Management Overview John Huth Harvard University Agency Review of LHC Computing Lawrence Berkeley Laboratory January 14-17, 2003

2 15 Jan 03 J. Huth LHC Computing Agency Review 2 Outline  Overview  Changes from Last year  LCG inception  U.S. ATLAS and International ATLAS  Highlights  Issues  Funding, base program funding  Review of actions on recommendations  External groups (iVDGL/PPDG/EDG)  Change control

3 15 Jan 03 J. Huth LHC Computing Agency Review 3 Major Changes Since Last Review  Research Program Launched – M+O and Computing considered as one “program”  Research program proposal submitted  Tier 2 funds  Physics generator interface  Some core support  CERN infrastructure support  Detector specific support  “Large” ITR workshop  Private grids – allowing small groups to work, but retain data “context” to entire experiment  Medium ITR’s in progress  LCG Project Launched  Major US ATLAS participation  US ATLAS Data management scheme adopted

4 15 Jan 03 J. Huth LHC Computing Agency Review 4 Luminosity Evolution of the LHC

5 15 Jan 03 J. Huth LHC Computing Agency Review 5 The Importance of LHC Computing to the US  The first run will be a major discovery run.  Even if the accelerator delivers only 1/40 th of the projected, SUSY will be discovered if it exists.  One must be prepared well in advance of the run if one is to exploit the physics.  These discoveries may likely be the most important to physics in the course of two decades, including many projects with larger initial investment for the U.S.  Computing investment is the key!

6 15 Jan 03 J. Huth LHC Computing Agency Review 6 The Scale of Computing for the LHC Comparison to Tevatron Experiments (closest benchmark) Number of detector elements x1000 CPU time x 10-1000 (combinatorics in tracking) Data volume x 10-100 Geographical distribution x 10 Collaboration size x 5

7 15 Jan 03 J. Huth LHC Computing Agency Review 7 International/US ATLAS  Deliverables from US ATLAS  Control/framework, Data management effort, DC support, build support  Facility support of data challenges  Incorporation and inception of grid tools for data challenges  PACMAN, MAGDA, Interoperability tests  Management  Architecture team – now Software manager nominee  Data management leadership  Detector specific

8 15 Jan 03 J. Huth LHC Computing Agency Review 8 ATLAS Computing organization (1999-2002) simulationreconstructiondatabasecoordinator QA groupsimulation reconstruction databaseArch. team Event filter Technical Group National Comp. Board Comp. Steering Group Physics Comp. Oversight Board Detector system

9 15 Jan 03 J. Huth LHC Computing Agency Review 9 ATLAS Subsystem/Task Matrix (present) Offline Coordinator ReconstructionSimulationDatabaseChair N. McCubbin D. Rousseau A. Dell’Acqua D. Malon Inner Detector D. Barberis D. Rousseau F. Luehring S. Bentvelsen / D. Calvet Liquid Argon J. Collot S. Rajagopalan M. Leltchouk H. Ma Tile Calorimeter A. Solodkov F. Merritt V.Tsulaya T. LeCompte MuonJ.Shank J.F. Laporte A. Rimoldi S. Goldfarb LVL 2 Trigger/ Trigger DAQ S. George S. Tapprogge M. Weilers A. Amorim / F. Touchard Event Filter V. Vercesi F. Touchard Computing Steering Group members/attendees: 4 of 19 from US (Malon, Quarrie, Shank, Wenaus) Physics Coordinator: F.Gianotti Chief Architect: D.Quarrie

10 15 Jan 03 J. Huth LHC Computing Agency Review 10 Project Core SW FTE

11 15 Jan 03 J. Huth LHC Computing Agency Review 11 FTE Fraction of Core SW

12 15 Jan 03 J. Huth LHC Computing Agency Review 12 News  Norman McCubbin steps down as Computing Coordinator  New Management structure  Computing Coordinator nominee:Dario Barberis  New position – Software Coordinator, nominee: David Quarrie  LCG – Project Oversight Board  J. Huth US Representative  NB plan from last year was to split US ATLAS/US CMS representation to the LCG. This has only come to pass.

13 15 Jan 03 J. Huth LHC Computing Agency Review 13 Proposed new computing organization DRAFT FOR DISCUSSION

14 15 Jan 03 J. Huth LHC Computing Agency Review 14

15 15 Jan 03 J. Huth LHC Computing Agency Review 15 Software Deliverables  Contol/framework:  Architcture, development of control/framework, including services  Simulation  Reconstruction  Services  Interfaces (scripting etc)  Collaboration with LHCb – using Gaudi kernel  Described as level-of-effort, plus technical annex describing requirements  Data management  Common LCG solution of hybrid solution (SQL+Root)  Fixed manpower contribution through intl. ATLAS

16 15 Jan 03 J. Huth LHC Computing Agency Review 16 Other contributions  Nightly builds at BNL  Event Generator interface (physics subproject)  Event data model  Detector description (non-project)

17 15 Jan 03 J. Huth LHC Computing Agency Review 17 Risks to SW deliverables  Erosion of base, plus lowered project funding  Reduction of effort on control/framework – 1 FTE at risk (/5)  Impact of support on some deliverables  Data management - 1 FTE at risk  NB – even with delays of LHC startup, risks  Scope of data challenges  Incorporation of trigger information  Calibration  Analysis support – detector simulation

18 15 Jan 03 J. Huth LHC Computing Agency Review 18 Detector Specific  Major roles in all detector subsystems – particularly  Muon reconstruction – Jim Shank (MOORE)  L Ar – simulation and reconstruction Srini Rajagopolan  Tilecal – reconstruction, missing Et (Merritt, LeCompte)  TRT – Simulation (F. Leuhring)  NB All subsystem effort comes from the base  NSF Research Program Proposal includes detector specific support of limited scope (level yet to be fixed)

19 15 Jan 03 J. Huth LHC Computing Agency Review 19 Facilities  Two forms of “deliverables”  International ATLAS: provide cache of ESD, and CPU cycles, access to users and for specific production tasks  Resources  Production  This is spelled out in ATLAS resources document, approved by Collaboration Board. (NB. Contributions can be in the form of Tier 1’s)  Support of US ATLAS physicists in doing analysis  Resources – storage, CPU, networking  Support – help desk, librarians, builds  Tier 1 facility (BNL)  Tier 2’s – general distributed infrastructure

20 15 Jan 03 J. Huth LHC Computing Agency Review 20 ATLAS DC1 Phase 1 : July-August 2002 (A. Putzer) 1.Australia 2.Austria 3.Canada 4.CERN 5.Czech Republic 6.France 7.Germany 8.Israel 9.Italy 10.Japan 11.Nordic 12.Russia 13.Spain 14.Taiwan 15.UK 16.USA

21 15 Jan 03 J. Huth LHC Computing Agency Review 21 Facilities Risks  Software development cycles required a substantial early ramp to get user involvement, develop reconstruction algorithms etc.  With reduced funding, this required delaying the facilities ramp.  Major issue: the facilities funding is now getting “hemmed in” – expected early funding is not materializing – late funding is insufficient for turn on of LHC  Will not meet data challenge needs  Will not meet facilities pledge (let alone contribute to CERN)  Support of US physicists at turn on seriously degraded

22 15 Jan 03 J. Huth LHC Computing Agency Review 22 Highlights of last year  Fads/goofy (alternative framework) issue solved  G4 now incorporated into Athena  Increased usage of Athena by collaboration, support  Adoption of (US ATLAS) hybrid database solution by LCG  Major success in grid production for data challenges  Atlas Definition Language dropped as a deliverable  Decision by CSG on technical grounds  Use of BNL Regional Center proposed to mine high level trigger data  Support of approx. 20 users  Good stress test

23 15 Jan 03 J. Huth LHC Computing Agency Review 23 Issues  After baselining exercise, funding profile is perpetually lower than agency guidance.  Funding information late relative to expectations/allocation time  Budget shortfall  Evaluation of new funding scenarios every 2 months  Base programs at the supporting national labs are eroding  Time of SW Manager split – working on solution

24 15 Jan 03 J. Huth LHC Computing Agency Review 24 Work in progress  Growth of grid activities – spans facilities and software domains  Management of deployment and use of grid tools  Coordination with CMS/LCG  Infrastructure support improving (SIT)  Adding Level 2 manager/structure for Grids/production

25 15 Jan 03 J. Huth LHC Computing Agency Review 25 Funding Sources  NSF  Research Program Proposal  Tier 2 centers  Core software support  Infrastructure support  Detector Specific support  Networking teams  Collaborative tools  Grid initiatives  GriPhyN - middleware supplied  iVDGL – prototype Tier 2 centers, manpower  New large ITR initiative – private grids to support analysis  University base  Detector specific software  Grid activities  Small and medium ITR’s

26 15 Jan 03 J. Huth LHC Computing Agency Review 26 Funding Sources II  DOE  Direct project funding  Core software support  Regional center support  PPDG  Incorporation of grid software  Base program support  Core software  Detector specific software  Grid activities

27 15 Jan 03 J. Huth LHC Computing Agency Review 27 Recommendations from Last Review Nov. 01: Software 1.The committee recommends to the international collaboration that is provide the chief architect with resources and authority to fulfill that role. Until that issue is resolved we recommend to US-ATLAS to continue this kind of fire-fighting for the common good of ATLAS. Ans: The new management structure of international ATLAS Computing addresses this with the position of the Software Project Leader. He/She will have direct responsibility for the organization of all work on software development and at the same time will be a member of the ATLAS Executive Board. In itself it doesn’t address the resource issue, but does give the position authority. 2.The committee recommends to intl. ATLAS management to enforce decisions about choices of software in the collaboration. Ans:The proposed new organization of ATLAS Computing foresees clearer management and reporting lines. Smaller committees, meeting more frequently than in the past, will ensure a larger circulation of information and take the appropriate decisions at the right technical level. Recent decisions (old structure) were dropping fads/goofy and ADL.

28 15 Jan 03 J. Huth LHC Computing Agency Review 28 SW Recommendations con’t 3.The committee recommends to US ATLAS software group to be less willing to take on additional workload. Ans: To some extent, they have resisted, but firefighting mode exists with Data Challenges. The increased spacing of data challenges helps alleviate some of the firefighting mode. 4.International ATLAS is strongly encouraged to provide a concrete staffing plan for DC1. Ans: This has happened. Gilbert Poulard (in charge of DC1) has organized a work package structure for DC1 with nominated people covering the key areas, and in addition there were major contributions from outside institutions.

29 15 Jan 03 J. Huth LHC Computing Agency Review 29 Facilities Recommendations 1.To test the system under a higher level of complexity (number of boxes) closer to that of a final system and with more mature software, as DC3 should be attempted no later than early 2005. A 20% complexity test should be considered… Ans: DC3 has been defined and scheduled for late 2004/early 2005. However, as regards the Tier 1 in particular, lack of funding is substantially limiting its ramp up in either capacity or complexity (number) of boxes. 2. The level of 25 FTE’s to support the Tier 1 facility during production appears reasonable. Nevertheless, benchmarking against best-in-class operations such as Celera Genomics is suggested. Ans: Benchmarking against the RHIC Computing Facility (RCF), a project of comparable scale with very similar qualitative requirements, a similar user community, and similar funding constraints seems more appropriate and is capable of being done much more precisely. A recent re-estimation based on the RCF has yielded a somewhat lower long term staffing requirement.

30 15 Jan 03 J. Huth LHC Computing Agency Review 30 Facilities Recommendations (con’t) 3.ATLAS should coordinate with CMS (as they have done with the disk technology studies) in technology evaluation of effective disk caching strategies as an alternative to the proposed scope change. Ans: Facility coordination and technology evaluation are being conducted by the iVDGL facilities group within the US and international coordination is under the auspices of LCG. Regarding disk caching strategies, while efforts to optimize them are in any case of significant value the decision to go to an all disk resident ESD model was made by ATLAS (not US ATLAS) and has major advantages for caching performance. The major US Tier 1 issue has been whether to have a complete disk resident ESD set at BNL or to depend on at least two other Tier 1 sites, the intervening transoceanic network and Grid middleware to complete any large scale access of data.

31 15 Jan 03 J. Huth LHC Computing Agency Review 31 Facilities Recommendations (con’t) 4.With the base plan still including tape storage for the ESD, as well ability to retrieve ESD from archival at the tier 0, balanced use of commodity components at both the tier 1 and tier 2 sites should be seriously evaluated before procurement begins. Ans: Commodity components are continuously evaluated as part of the ongoing RCF/ACF operations and this experience is essential in the design and planning for the ATLAS Tier 1. The iVDGL facilities group is also very active in the evaluation and testing of commodity components. The use of lower cost commodity based disk in analyses is an going activity of significant activity at BNL both for ATLAS and RHIC.

32 15 Jan 03 J. Huth LHC Computing Agency Review 32 Facilities Recommendations (con’t) 5. Attention must be paid to the need for increased network bandwidth and an appropriate support team. Ans: The NSF Research Program Proposal includes a support line of for networking infrastructure. Backbone capabilities and last-mile issues are actively being addressed by Shawn McKee who is delegated to work in this area. BNL was upgraded by ESNET to OC12 in the summer of 2002. This will be sufficient for the near term needs of ATLAS data challenges. The longer term upgrade for the Tier 1 facility is being actively pursued, both by the Tier 1 facility group and the BNL network support group.

33 15 Jan 03 J. Huth LHC Computing Agency Review 33 Management Recommendations 1. US ATLAS PCP should move to define its projects as well as possible so that mission creep can be avoided. Ans: Three areas of concern from the last review have been addressed: a) consolidation of one baseline for data management in the LCG (hybrid- root), b) software infrastructure team for Intl. ATLAS and c) creation of the new Software manager position (US ATLAS person nominated). 2.) US ATLAS PCP should watch for and prevent or mitigate overload on its personnel from accepting too many responsibilities at the international level if this could compromise its ability to deliver its commitments. Ans: We are keeping an eye on this. The situation has improved since the last time, and some of the commitments in deliverables has shrunk due to the LCG Applications projects.

34 15 Jan 03 J. Huth LHC Computing Agency Review 34 Management Recommendations (con’t) 3. The US project should push the International Organization for clear decisions on technical issues and ATLAS standards so as to avoid duplication and wasted efforts and must work to do the same within the US part of the project. Ans: We have been doing this with success. The choice of the common data management solution, elimination of the fads/goofy framework, and issue of ADL have all been decisions that move in the direction of clear technical decisions which reduce duplication. 4. US ATLAS should monitor the productivity of its staff and make sure that it is commensurate to its costs. Ans: We are doing this constantly. Personnel changes occur as a result of addressing these issues. Examples include shifting funding to more productive and less expensive individuals, and a consolidation of effort. This is an ongoing process.

35 15 Jan 03 J. Huth LHC Computing Agency Review 35 Management Rec’s (con’t) 5. It is important to make sure that the scope and deliverables of the project are not severely impacted by decisions made at the CERN/LHC level. US ATLAS must make sure that it is properly represented in the decision-making process and must be prepared to clearly and accurately state the impact of any major changed to its ability to deliver. Ans: US ATLAS has major representation in the applications area of the LCG (Wenaus, PEB applications leader). Vicky White has been the US representative to the Grid Deployment Board and has been very active in representing our viewpoints. We do feel that having more US representation or dialog with the GDB would be desirable, particularly in the formulation of facilities planning.

36 15 Jan 03 J. Huth LHC Computing Agency Review 36 Management Rec’s (con’t) 6. As the LHC schedule becomes better defined over the next 6 months, US ATLAS, working with International ATLAS and US Funding agencies must be prepared to revise its schedule, milestones and budget profiles accordingly. Ans: We have done this. With the stretch out, this makes the all-disk option for the facilities more attractive, due to moore’s law. On the other hand, the current funding guidance is hemming in the project both from the near term (’03 and 04) and in the long term (before the start of data taking). Already, the project is at serious risk to support US physicists at the turn on of the LHC. Funding levels risk consigning us to a second rate status. Budgeting exercises occur roughly 6 times a year for 5-6 year profiles.

37 15 Jan 03 J. Huth LHC Computing Agency Review 37 Management Rec’s (con’t) 7. US ATLAS should present at the next meeting a detailed cost estimate, schedule and milestones for its proposed modification of the architecture of the Tier 1 center to use a disk based system for ESD storage. Ans: Cost details for the full disk configuration have been done with the same level of detail as the previous disk/tape model. Increased CPU and WAN capacities have been estimated, corresponding to the increased availability of data at the Tier 1. Experience with disk-centric analyses during DC1 Phase II will contribute to a better understanding of how ATLAS users will respond to this analysis model.

38 15 Jan 03 J. Huth LHC Computing Agency Review 38 An Instance of Change Control  Our Proj. Management Plan describes a change control procedure, which invokes the CCB (Computing Coordination Board), in a process to grant change control.  R. Gardner departed from Indiana University to Univ. of Chicago to become iVDGL coordinator. His funding is via iVDGL was for a prototype Tier 2 site at Indiana.  Request was for prototype effort to remain at Indiana (substantial infrastructure), but have personnel funded at U.Chicago  Additional manpower, in effect, comes from this change  All parties agreed  CCB agreed with this, but didn’t see this change as an entitlement for a final Tier 2 at either Indiana or U. Chicago (to be revisited in 2 years).  Change control memo written to file.

39 15 Jan 03 J. Huth LHC Computing Agency Review 39 Summary  Consolidation of US ATLAS deliverables  Usage of Athena, hybrid – DB solution adopted by LCG  Extensive use of US ATLAS grids in data challenges  Usage of BNL Tier 1 to mine HLT data  Coherency of grid activities  Large ITR proposal in progress  Funding is THE ISSUE  Stability and level of profile insufficient  Lead time in planning


Download ppt "US-ATLAS Management Overview John Huth Harvard University Agency Review of LHC Computing Lawrence Berkeley Laboratory January 14-17, 2003."

Similar presentations


Ads by Google