Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting GridPP Status Report Tony Doyle.

Similar presentations


Presentation on theme: "Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting GridPP Status Report Tony Doyle."— Presentation transcript:

1 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting GridPP Status Report Tony Doyle

2 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Contents What was GridPP1? What is GridPP2? Challenges abound LCG –Issues Deployment Status (9-28-30/1/05 ) –Tier-1/A, Tier-2, NGS M/S/N Middleware Food chains Applications Dissemination The UK mountain climb Summary

3 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting What was GridPP1? A team that built a working prototype grid of significant scale > 2,000 (9,000) CPUs > 1,000 (5,000) TB of available storage > 1,000 (6,000) simultaneous jobs A complex project where 88% of the milestones were completed and all metrics were within specification A Success “The achievement of something desired, planned, or attempted”

4 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Executive Summary I “The GridPP1 Project is now complete: following 3 years of development, a prototype Grid has been established, meeting the requirements of the experiments and fully integrated with LCG, currently the World’s largest Grid. Starting from this strong foundation, a more complex project, GridPP2, has now started, with an extended team in the UK working towards a production Grid deployed for the benefit of all experiments by September 2007.” We achieved (almost exactly) what we stated we would do in building a prototype…

5 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Executive Summary II “2004 was a pivotal year, marked by extraordinary and rapid change with respect to Grid deployment, in terms of scale and throughput. The scale of the Grid in the UK is more than 2000 CPUs and 1PB of disk storage (from a total of 9,000 CPUs and over 5PB internationally), providing a significant fraction of the total resources required by 2007. A peak load of almost 6,000 simultaneous jobs in August, with individual Resource Brokers able to handle up to 1,000 simultaneous jobs, gives confidence that the system should be able to scale up to the required 100,000 CPUs by 2007. A careful choice of sites leads to acceptable (>90%) throughput for the experiments, but the inherent complexity of the system is apparent and many operational improvements are required to establish and maintain a production Grid of the required scale. Numerous issues have been identified that are now being addressed as part of GridPP2 planning in order to establish the required resource for particle physics computing in the UK.” Most projects fail in going from prototype to production… There are many issues: methodical approach reqd.

6 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting What is GridPP2? Structures agreed and in place (except LCG phase-2) 253 Milestones, 112 Monitoring Metrics at present. Must deliver a “Production Grid”: robust, reliable, resilient, secure, stable service delivered to end-user applications. The Collaboration aims to develop, deploy and operate a very large Production Grid in the UK for use by the worldwide particle physics community.

7 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Must share data between thousands of scientists with multiple interests link major (Tier-0 [Tier-1]) and minor (Tier-1 [Tier-2]) computer centres ensure all data accessible anywhere, anytime grow rapidly, yet remain reliable for more than a decade cope with different management policies of different centres ensure data security be up and running routinely by 2007 What are the Grid challenges?

8 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting What are the Grid challenges? Data Management, Security and Sharing 1. Software process 2. Software efficiency 3. Deployment planning 4. Link centres 5. Share data 6. Manage data7. Install software 8. Analyse data9. Accounting 10. Policies

9 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting What are the limits on Data? Advanced Areal Density Trends M. Leonhardt 4-9-02 0.001 0.01 0.1 1 10 100 1000 10000 100000 1000000 19871992199720022007201220172022 Year Areal Density (Gb/in 2 ) Parallel Track Longitudinal Tape Helical Tape Magnetic Disk Optical Disk Superparamagnetic Effect Probe Contact Area Viability Atom Surface Density Atom Level Storage Probe Volumetric Optical ? ? ? Tape Demos ? Technical Progress Technology Boundaries Serpentine Longitudinal Tape LHC era 1 PetaBit/in 2 !! 1 Terabit/in 2 ! Currently disk capacity doubles every year (or so) for unit cost.

10 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting LHC era No Exponential is Forever … but We Can Delay 'ForeverNo Exponential is Forever … but We Can Delay 'Forever‘ ftp://download.intel.com/ research/silicon/ Gordon_Moore _ISSCC_021003.pdf Technical Progress Technology Boundaries What are the limits on CPU? Moore’s Law Currently CPU performance doubles every two years (or so) for unit cost.

11 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Step-1.. financial planning Applies to our problem? (See Dave’s talk) Step-2.. Compare to (e.g. Tier-1) expt. requirements Step-3.. Conclude that more than one centre is needed Step-4.. A Grid? Ian Foster / Carl Kesselman: "A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities." Currently network performance doubles every year (or so) for unit cost.

12 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting How do I start? http://www.gridpp.ac.uk/start/ http://www.gridpp.ac.uk/start/ Getting started as a Grid user Quick start guide for LCG2 GridPP guide to starting as a user of the Large Hadron Collider Computing Grid.Quick start guide for LCG2 Getting an e-science certificate In order to use the Grid you need a Grid certificate. This page introduces the UK e-Science Certification Authority, which issues cerficates to users. You can get a certificate from here.Getting an e-science certificatehere Using the LHC Computing Grid (LCG) CERN's guide on the steps you need to take in order to become a user of the LCG. This includes contact details for support.Using the LHC Computing Grid (LCG) LCG user scenario This describes in a practical way the steps a user has to follow to send and run jobs on LCG and to retrieve and process the output successfully.LCG user scenario Currently being improved.. DTEAM

13 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Where do we start? Issues https://edms.cern.ch/file/495809/2.2/LCG2-Limitations_and_Requirements.pdf First large-scale Grid production problems being addressed… at all levels “LCG-2 MIDDLEWARE PROBLEMS AND REQUIREMENTS FOR LHC EXPERIMENT DATA CHALLENGES” Overall efficiency ~60% ¼ of the problems ¾ of the problems

14 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting GridPP Deployment Status (9-28-30/1/05) Three Grids on Global scale in HEP (similar functionality) sitesCPUs LCG (GridPP)90 (16)9000 (2242) Grid3 [USA]292800 NorduGrid303200 GridPP deployment is part of LCG (Currently the largest Grid in the world) The future Grid in the UK is dependent upon LCG releases totalCPUfreeCPUrunJobwaitJobseAvail TBseUsed TBmaxCPUavgCPU Total2242915591784936.874.45106482232

15 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting UK Tier-1/A Centre Rutherford Appleton Laboratory High quality data services National and international role UK focus for international Grid development 1000 CPU 200 TB Disk 60 TB Tape (Capacity 1PB) Grid Resource Discovery Time = 8 Hours 2004 CPU Utilisation 2004 Disk Use Peak Utilisation Fall-off in Q4

16 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting UK Tier-2 Centres The whole is better than the sum of the parts..

17 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Level-2 Grid In future will include services to facilitate collaborative (grid) computing Authentication (PKI X509) Job submission/batch service Resource brokering Authorisation Virtual Organisation management Certificate management Information service Data access/integration (SRB/OGSA-DAI/DQPS) National Registry (of registry’s) Data replication Data caching Grid monitoring Accounting * Leeds Manchester * * Oxford RAL * * DL

18 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Middleware Development Configuration Management Storage Interfaces Network Monitoring Security Information Services Grid Data Management Deployment Area 1.LCFG 2.Generic 3.Quattor

19 Enabling Grids for E-sciencE INFSO-RI-508833 LHCC Comprehensive Review – November 2004 19 Prototype Middleware Status & Plans (I) Workload Management –AliEn TaskQueue –EDG WMS (plus new TaskQueue and Information Supermarket) –EDG L&B Computing Element –Globus Gatekeeper + LCAS/LCMAPS  Dynamic accounts (from Globus) –CondorC –Interfaces to LSF/PBS (blahp) –“Pull components”  AliEn CE  gLite CEmon (being configured) Blue: deployed on development testbed Red: proposed

20 Enabling Grids for E-sciencE INFSO-RI-508833 LHCC Comprehensive Review – November 2004 20 Prototype Middleware Status & Plans (II) Storage Element –Existing SRM implementations  dCache, Castor, …  FNAL & LCG DPM –gLite-I/O (re-factored AliEn-I/O) Catalogs –AliEn FileCatalog – global catalog –gLite Replica Catalog – local catalog –Catalog update (messaging) –FiReMan Interface –RLS (globus) Data Scheduling –File Transfer Service (Stork+GridFTP) –File Placement Service –Data Scheduler Metadata Catalog –Simple interface defined (AliEn+BioMed) Information & Monitoring –R-GMA web service version; multi-VO support

21 Enabling Grids for E-sciencE INFSO-RI-508833 LHCC Comprehensive Review – November 2004 21 Prototype Middleware Status & Plans (III) Security –VOMS as Attribute Authority and VO mgmt –myProxy as proxy store –GSI security and VOMS attributes as enforcement  fine-grained authorization (e.g. ACLs)  globus to provide a set-uid service on CE Accounting –EDG DGAS (not used yet) User Interface –AliEn shell –CLIs and APIs –GAS  Catalogs  Integrate remaining services Package manager –Prototype based on AliEn backend –evolve to final architecture agreed with ARDA team

22 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Middleware & OGSA-compliance We need an “open” “grid” “services” “architecture” 1.Infrastructure Services that enable communication between disparate resources (computer, storage, applications, etc.), removing barriers associated with shared utilization. 2.Resource Management Services that enable the monitoring, reservation, deployment, and configuration of grid resources based on quality of service requirements 3.Data Services that enable the movement of data where it is needed – managing replicated copies, query execution and updates, and transforming data into new formats if required. 4.Context Services that describe the required resources and usage policies for each customer that utilizes the grid – enabling resource optimization based on service requirements. 5.Information Services that provide efficient production of, and access to, information about the grid and its resources, including status and availability of a particular resource. 6.Self-Management Services that support the attainment of stated levels of service with as much automation as possible, to reduce the costs and complexity of managing the system. 7.Security Services that enforce security policies within a virtual organization, promoting safe resource-sharing and appropriate authentication and authorization of users. 8.Execution Management Services that enable both simple and more complex workflow actions to be executed, including placement, provisioning, and management of the task lifecycle

23 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Oasis WS-RF & WS-I+ WS-RF (the Oasis standard) WS-I+ (implementation?) UK e-Science Core programme services (July 2004): WS-I+ –WS-I Basic Profile (XSD, WSDL 1.1, SOAP 1.1, UDDI) –WS-I Basic Security Profile (parts of WS-Security)  BPEL  WS-Addressing (to be replaced the ongoing W3C activity).  WS-ReliableMessaging  WS-Eventing A service built with WS-RF will not interoperate with WS-I+ client… UK e-Science meeting today

24 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting gLite & ARDA Metadata gLite (a standard?) ARDA (an implementation?) End-user throughput or standards driven? GSOAP optimisation important Early days.. Some overlapping functionality – missing extensibility in gLite APIs differ Testing ongoing: middle ground – adapt to gLite interfaces (e.g. AMI-gLite), test ARDA implementation

25 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting The Oasis:OGSA:WSRF:WSI+:gLite:ARDA:experiment experiment:ARDA:gLite:WSI+:WSRF:OGSA:Oasis food chain? 1. A hierarchy? 2. A virtuous? circle Only works if there is sufficient decomposition… Discussion required Depends on your World view…

26 Conference xxx - August 2003 Fabrizio Gagliardi DataGrid Project Manager and EGEE designated Project Director CERN Geneva Switzerland Workshop on eInfrastructures (Internet and Grids) Best practices and challenges Need to relate high level plan to what is required on the ground

27 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting LCG Robustness e.g. data management LCG File Catalog (LFC) developed to address the performance and scalability problems seen in the 2004 Data Challenges Features include hierarchical namespace, transactions, cursors, timeouts & retries, GSI security, ACLs... Performance testing almost complete Tests of insert, query and delete rates up to 40,000,000 entries and 10 clients / 100 concurrent threads Insert rates almost independent of number of entries in LFC, much more scalable than EDG RLS. Higher delete rate than EDG RLS Query rate lower than Globus but higher than EDG.. but LFC retrieves much more information with query so matches user patterns better Scales well to many replicas and LFNs per GUID, and to many concurrent users http://ppewww.ph.gla.ac.uk/~caitrian/LFC

28 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Testing & Documentation e.g. data management lcg-aa "add-alias" adds an alias in RMC for a given GUID."add-alias" lcg-cp "copy" copies a grid file to a specific location on UI area."copy" lcg-cr "copy-and-register" copy a file to the SE and registers the file in the SE's LRC."copy-and-register" lcg-del "delete" deletes a file."delete" lcg-gt "get-turl" gets the TURL for a given SURL + transfer protocol."get-turl" lcg-infosites "list-all sites information" lists important information for all sites on the grid."list-all sites information" lcg-la "list-aliases" lists all the aliases for a given LFN, GUID or SURL."list-aliases" lcg-lg "list-GUID" lists the GUID for a given LFN or SURL."list-GUID" lcg-lr "list-replicas" lists the replicas for a given LFN, GUID or SURL."list-replicas" lcg-ra "remove-alias" removes an alias in RMC for a given GUID."remove-alias" lcg-rep "replicate" copies a file from one SE to another SE and registers it in the destination SE's LRC."replicate" lcg-rf "register-file" registers in LRC a file residing on an SE."register-file" lcg-uf "unregister-file" unregisters in LRC a file residing on an SE."unregister-file" Preliminary tests completed for all 91 data management Commands Simple additional Documentation added http://ppewww.ph.gla.ac.uk/~fergusjk/

29 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Application Development ATLAS LHCbCMS BaBar (SLAC) SAMGrid (FermiLab) QCDGridPhenoGrid

30 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Applications There is a (slightly wonky?) wheel Use it to get to where you need to be ZEUS uses LCG needs the Grid to respond to increasing demand for MC production up to 6 million Geant events per week on Grid since August 2004 1.The system developed for the large LHC experiments works (more) effectively for other (less resource- intensive) applications 2.Experiments need to work together with deployment team/sites 3.The de-facto deployment standard is LCG – it ~works. We can add components as required, to meet each experiment’s needs

31 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Dissemination much has happened.. more people are reading about it.. GridPP2 gets its first term reportFri 28 Jan 2005 BaBar UK moves into the Grid eraTue 11 Jan 2005 LHCb-UK members get up to speed with the GridWed 5 Jan 2005 GridPP in PittsburghThu 9 Dec 2004 GridPP website busier than everMon 6 Dec 2004 Optorsim 2.0 releasedWed 24 Nov 2004 ZEUS produces 5 million Grid eventsMon 15 Nov 2004 CERN 50th anniversary receptionTue 26 Oct 2004 GridPP at CHEP'04Mon 18 Oct 2004 LHCb data challenge first phase a success for LCG and UKMon 4 Oct 2004 Networking in Nottingham - GLIF launch meetingMon 4 Oct 2004 GridPP going for Gold - website award at AHMMon 6 Sep 2004 GridPP at the All Hands MeetingWed 1 Sep 2004 R-GMA included in latest LCG releaseWed 18 Aug 2004 LCG2 administrators learn tips and tricks in OxfordTue 27 Jul 2004 Take me to your (project) leaderFri 2 Jul 2004 ScotGrid's 2nd birthday: ScotGrid clocks up 1 million CPU hoursFri 25 Jun 2004 Meet your production managerFri 18 Jun 2004 GridPP10 report and photographsWed 9 Jun 2004 CERN recognizes UK's outstanding contribution to Grid computingWed 2 Jun 2004 UK particle physics Grid takes shapeWed 19 May 2004 A new monitoring map for GridPPMon 10 May 2004 Press reaction to EGEE launchTue 4 May 2004 GridPP at the EGEE launch conferenceTue 27 Apr 2004 LCG2 releasedThu 8 Apr 2004 University of Warwick joins GridPPThu 8 Apr 2004 Grid computing steps up a gear: the start of EGEEThu 1 Apr 2004 EDG gets glowing final reviewMon 22 Mar 2004 Grids and Web Services meeting, 23 April, LondonTue 16 Mar 2004 EU DataGrid Software License approved by OSIFri 27 Feb 2004 GridPP Middleware workshop, March 4-5 2004, UCLFri 20 Feb 2004 Version 1.0 of the Optorsim grid simulation tool released by EU DataGridTue 17 Feb 2004 Summary and photographs of the 9th GridPP Collaboration MeetingThu 12 Feb 2004 138,976 hits in December

32 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Annual data storage: 2.4-2.8 PetaBytes per year? (~20%) 10 Million SPECint2000  10,000 PCs (3 GHz Pentium 4) CD stack (~ 4 km) The UK mountain climb has started.. Quantitatively, we’re ~10% of the way there in terms of UK CPU (~1,000 ex ~10,000) and disk (~1 ex ~10 PB) In production terms, left base camp We are here (0.4 km) step-by-step plan in place… For the Ben Nevis climb? totalCPUfreeCPUrunJobwaitJobseAvail TBseUsed TBmaxCPUavgCPU Total2242915591784936.874.45106482232

33 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Summary GRIDPP-PMB-40-EXEC The Grid is a reality A project was/is needed Under control LCG2 support: SC case presn. 3/2/05 16 UK sites are on the Grid –MoUs, planning, deployment, monitoring –each underway as part of GridPP2 Developments estd., R-GMA deployed gLite designed inc. web services Interfaces developed, testing phase Area transformed Incorporation in HEP programme.. Introduction Project Management Resources LCG Deployment –Tier-1/A production + Tier-2 resources M/S/N EGEE Applications Dissemination Beyond GridPP2

34 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Top 10 Issues? 1.Issues are the ones that your oversight committee tells you are issues? 2.Issues are long-term (endemic) problems - they were around 3 years ago? 3.Issues are wider than this? The ones you thought might be problems at the start? (but they were called challenges)

35 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting PPARC Oversight Committee Issues 1. GridPP may be underestimating the difficulty of engaging with each of the experiment teams. 2. Document with a plan to support UK physics analysis community in 2007 is needed. 3. Tier-1 allocation policy - define usage policy. i.e. what is the absolute scale? Are we under/over-committing from PPARC perspective? 4. Need to update GridPP2 Risk Register. 5. OC requires the LCG funding case to be put to them before going to Science Committee. (This has been done) 6. Get-fit plan on Production Metrics. How do we move from 60% to >90% and how will this be monitored in the UK. 7. Nail down the metrics - no sensible values yet established. Iterations are required. 8. Clarify probable direction of GridPP in terms of middleware.

36 Tony Doyle - University of Glasgow 2002 Challenges  Complete rollout of TB-1 and plan future upgrades  Reconvened ATF to work closely with applications  Make TB-2 a success  Deploy and exploit Tier-1/A  Applications to make good use of testbeds  Solve interoperability issues  We are part of many larger collaborations/structures/groupings - we need to collaborate/discuss engage here, and  Focus on implementation in the UK… this will tell us what works (and what doesn’t) at any given point.

37 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting What are the Grid challenges? Data Management, Security and Sharing 1. Software process 2. Software efficiency 3. Deployment planning 4. Link centres 5. Share data 6. Manage data7. Install software 8. Analyse data9. Accounting 10. Policies

38 Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting Top 10 Issues? Three methods to identify issues: "If you cannot measure it, you cannot improve it." Need to quantify end-to-end throughput… measurements are important… Tackle the issues as they present themselves In a timely way… LHC data is imminent… Is there a GridPP top 10? Answer?: No (probably)


Download ppt "Tony Doyle - University of Glasgow 31 January 2005GridPP12 Collaboration Meeting GridPP Status Report Tony Doyle."

Similar presentations


Ads by Google