Presentation is loading. Please wait.

Presentation is loading. Please wait.

GridPP Deployment Status, User Status and Future Outlook Tony Doyle.

Similar presentations


Presentation on theme: "GridPP Deployment Status, User Status and Future Outlook Tony Doyle."— Presentation transcript:

1 GridPP Deployment Status, User Status and Future Outlook Tony Doyle

2 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Introduction A.What is the deployment status? B.Is the system usable? C.What is the future of GridPP? Wot no middleware?

3 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 GridPP Middleware is.. Security Network Monitoring Information Services Grid Data Management Storage Interfaces Workload Management Middleware

4 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 e.g. LCG monitoring applet Monitor: –resource brokers –virtual organisations ATLAS CMS LHCb DTeam Other SQL queries to logging and book-keeping database Middleware

5 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 e.g. APEL and R-GMA R-GMA structure used in accounting system (GOCDB) For gLite the sensors are provided by DGAS via DGAS2APEL the EGEE portal for accounting data is provided by CESGA Middleware

6 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Resources 17/12/06: EGEE total slots 34141 => UKI is 6949 ~20% of the total 17/12/06: EGEE jobs running 21291 => UKI is 2912 ~ 14% jobs Max EGEE = 42517 Max UKI = 8176 (N.B. hyperthreading distorts 1:1 job:CPU core relation – reduces UKI numbers by ~500) http://goc.grid.sinica.edu.tw/gstat/UKI.html Sundays STATUStotalCPUfreeCPUrunJobwaitJobseAvail TBseUsed TBmaxCPUavgCPU Total6949321029127732124631381766716 Steady climb since 2004 towards target of ~10,000 CPU (cores) (~job slots)

7 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 http://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_egee.php Resources 2006 CPU Usage by Region Via APEL accounting

8 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 (not all records are being accounted) http://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_egee.php Resources

9 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 2006 CPU Usage by experiment http://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_egee.php Resources Total CPU used 52,876,788 kSI2k-hours!

10 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 (Estimated utilisation based on gstat job slots/usage) UKI mirrors overall EGEE utilisation Average Utilisation for Q306: 66% Compared to target of ~70% CPU utilisation was a T2 issue, but now improving.. Utilisation

11 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 (measured by UK Tier-1 for all VOs) ~90% CPU efficiency due to i/o bottlenecks is OK Concern that this is currently ~75% Efficiency Each experiment needs to work to improve their system/deployment practice anticipating e.g. hanging gridftp connections during batch work target

12 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 (is still an issue for Tier-1 and Tier-2s) http://www.gridpp.ac.uk/storage/status/gridppDiscStatus.html Utilisation is low (~30%) at T2s and accounting [by VO] is not (yet) there Storage

13 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 GOCDB Accounting Display - under development Looking at data for RAL-LCG2 Storage units are 1TB = 10^6 MB Tape Used + Disk Used = Total Sensor Drop Outs have been fixed Total Used Storage (TB) Tape Used Disk Used Storage

14 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 SRM at T1 ~200TB of disk (deployment problem in 2006) –~100% usage (problem for 2006 service challenges) –Castor 2.1 SRM at all T2s ~200TB of disk in total –~30% usage: difficult to calculate –dCache 1.7.0 and DPM v1.5.10 –Dedicated disk servers advised (storage should be robust) Need to make sure sites are running the latest GIP plugins (https://twiki.cern.ch/twiki/bin/view/EGEE/GIP-Plugins) New GOC storage accounting system being put in place being deployed at Tier-2s SRM v2.2 is being implemented: need to test interoperability Storage

15 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 (individual rates) Aim: to maintain data transfers at a sustainable level as part of experiment service challenges http://www.gridpp.ac.uk/wiki/Service_Challenge_Transfer_Test_Summary File Transfers Current goals:goals >250Mb/s inbound-only >250Mb/s outbound-only >200Mb/s inbound and outbound

16 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Approval for new (shared) machine room – ETA Summer 2008. Space for 300 racks. Procurement –March 06: 52 AMD 270 units, 21 disk servers (168TB data capacity) –FY 06/07: 47 disk servers (282TB disk capacity), 64 twin dual-core Intel Woodcrest 5130 units (550kSI2K) –FY 06/07 upcoming: further 210 TB disk capacity plus high-availability systems (redundant PSUs, hot-swappable paired HDDs) Storage commissioning saga –Ongoing problems with March kit. Firmware updates have now solved problem. (Disks on Areca 1170 in raid 6 experienced multiple dropouts during testing of WD drives) Move to CASTOR –Very support heavy but made available for CSA06 and performing well General - Air-con problems with high-temperatures triggering high pressure cut-outs in refrigerator gas circuits (summers are warmer even in the UK...) - July security incident - 10Gb CERN line in place. Second 10Gb line scheduled in 07Q1 Tier-1 Resource

17 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 e.g. Glasgow: UKI-SCOTGRID-GLASGOW 800 kSI2k 100 TB DPM Needed for LHC s t a rt- u p August 28 September 1 October 13 October 23 T2 Resources IC-HEP 440 KSI2K 52 TB dCache Brunel 260 KSI2K 5 TB DPM

18 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Could also be 2006 T2 Resources As overheard at one T2 site..

19 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 A. Usability (Prequel) GridPP runs a major part of the EGEE/LCG Grid, which supports ~3000 users The Grid is not (yet) as transparent as end- users want it to be The underlying overall failure rate is ~10% User (interface)s, middleware and operational procedures (need to) adapt Procedures to manage the underlying problems such that system is usable are highlighted

20 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Virtual Organisations Users are grouped into VOs –Users/VO varies from 1 to 806 members (and growing..) Broadly four classes of VO –LHC experiments –EGEE supported –Worldwide (mainly non-LHC particle physics) –Local/regional e.g. UK PhenoGrid Sites can choose which VOs to support, subject to MOU/funding commitments –Most GridPP sites support ~20 VOs –GridPP nominally allocates 1% of resources to EGEE non-HEP VOs –GridPP currently contributes 30% of the EGEE CPU resources

21 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 User evolution Number of users of the UK Grid (exc. Deployment Team) Quarter: 05Q4 06Q206Q3 Value: 1342 1831 2777 Many EGEE VOs supported c.f. 3000 EGEE target Number of active users (> 10 jobs per month) Quarter: 05Q4 06Q1 06Q2 Value: 83 166 201 Fraction: 6.2% 11.0% Viewpoint: growing fairly rapidly, but not as active as they could be? depends on the active definition

22 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 806 atlas 763 dzero 577 cms 566 dteam 150 lhcb 131 alice 75 bio 65 dteamsgm 41 esr 31 ilc 27 atlassgm 27 alicesgm 21 cmsprg 18 atlasprg 17 fusn 15 zeus 13 dteamprg 13 cmssgm 11 hone 9 pheno 9 geant 7 babar 6 aliceprg 5 lhcbsgm 5 biosgm 3 babarsgm 2 zeussgm 2 t2k 2 geantsgm 2 cedar 1 phenosgm 1 minossgm 1 lhcbprg 1 ilcsgm 1 honesgm 1 cdf Know your users? UK-enabled VOs

23 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Resource allocation Assign quotas and priorities to VOs and measure delivery, but further work required on VOMS-roles/groups within each VO VOMS provides group/role information in the proxy Tools to control quotas and priorities in site services being developed –So far only at whole-VO level –Maui batch scheduler is flexible, easy to map to groups/roles –Sites set the target shares –Can publish VO/group-specific values in GLUE schema, hence the RB can use them for scheduling Accounting tool (APEL) measures CPU use at global level (UK task) –Storage accounting currently being added –GridPP monitors storage across UK –Privacy issues around user-level accounting, being solved by encryption

24 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 User Support Becoming vital as the number of users grows –But modest effort available in the various projects Global Grid User Support (GGUS) portal at Karlsruhe provides a central ticket interface –Problems are categorised Tickets are classified by an on-duty Ticket Process Manager, and assigned to an appropriate support unit –UK (GridPP) contributes support effort GGUS has a web-service interface to ticketing systems at each ROC –Other support units are local mailing lists –Mostly best-effort support, working hours only Currently ~tens of tickets/week –Manageable, but may not scale much further –Some tickets slip through the net

25 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Documentation & Training Need documentation and training for both system managers and users –Mostly expert users up to now, but user community is expanding –Induction of new VOs is a particular problem – no peer support –EGEE is running User Fora for users to share experience Next in Manchester in May 07 (with OGF) –EGEE has a dedicated training activity run by NeSC/Edinburgh Documentation is often a low priority, little dedicated effort –The rapid pace of change means that material requires constant review Effort on documentation is now increasing –GridPP has appointed a documentation officer GridPP web site, wiki –Installation manual for admins is good There is also a wiki for admins to share experience –Focus is now on user documentation New EGEE web site – coming soon

26 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Alternative view? The number of users in the Grid School for the Gifted is ~manageable now The system may be too complex, requiring too much work by the average user? Or the (virtual) help desk may not be enough? Or the documentation may be misleading? Or.. Having smart users helps (the current ones are)

27 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Timeline – 1 Proposal WritingProposal Defence Apr MayJunJulAugSepOct 31 st March – PPARC Call 16 th June – GridPP16 at QMUL 6 th September – 1 st PPRP review 1 st November – GridPP17 8 th November PPRP visiting panel 13 th July – Bid Submitted CBOCCB Future? Year-long process to define future LHC exploitation http://www.gridpp.ac.uk/docs/gridpp3/

28 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Scenario Planning – Resource Requirements [TB, kSI2k] GridPP requested a fair share of global requirements, according to experiment requirements Changes in the LHC schedule prompted a(nother) round of resource planning - presented to CRRB on Oct 24 th New UK resource requirements have been derived and incorporated in the scenario planning e.g. Tier-1

29 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Input to Scenario Planning – Hardware Costing Empirical extrapolations with extrapolated (large) uncertainties Hardware prices have been re-examined following recent Tier-1 purchase CPU (woodcrest) was cheaper than expected based on extrapolation of previous 4 years of data

30 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Scenario Planning An example 70% scenario based on Experiment Inputs [£m]

31 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Timeline – 2 Nov DecJanFebMarAprMay 8 th Nov –PPRP Visiting Panel 6 th Dec – PPRP recommend to SC PPARC Council Science Committee Grants etc. GridPP2+ outcome (1/9/07-31/3/08) now known emphasis on operations (modest middleware support) Anticipates GridPP3 outcome (1/4/08-31/3/11) known in the New Year Back to the Future?

32 Tony Doyle - University of Glasgow INFNGrid Meeting 20 December 2006 Conclusion A.What is the deployment status? (snapshot) See e.g. Performance of the UK Grid for Particle Physics http://www.gridpp.ac.uk/papers/GridPP_IEEE06.pdf for more info. http://www.gridpp.ac.uk/papers/GridPP_IEEE06.pdf B.Is the system usable? Yes, but more work required from end-user perspective C.What is the future of GridPP? Operations-led activity, working with EGEE/EGI (EU) and NGS (UK)


Download ppt "GridPP Deployment Status, User Status and Future Outlook Tony Doyle."

Similar presentations


Ads by Google