Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688EGEE and gLite are registered trademarks The EGEE Production Grid Ian Bird EGEE Operations Manager HEPiX Jefferson Lab, 12 th October.

Similar presentations


Presentation on theme: "EGEE-II INFSO-RI-031688EGEE and gLite are registered trademarks The EGEE Production Grid Ian Bird EGEE Operations Manager HEPiX Jefferson Lab, 12 th October."— Presentation transcript:

1 EGEE-II INFSO-RI-031688EGEE and gLite are registered trademarks The EGEE Production Grid Ian Bird EGEE Operations Manager HEPiX Jefferson Lab, 12 th October 2006 Enabling Grids for E-sciencE

2 EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 2 Outline Some history –What led up to where we are now? –The EGEE project What is the EGEE grid infrastructure today? –What has been achieved? –How is it used? –How does it compare and relate to other production grids? Outlook

3 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 3 Some history … LHC  EGEE Grid 1999 – Monarc Project –Early discussions on how to organise distributed computing for LHC 2000 – growing interest in grid technology –HEP community was the driver in launching the DataGrid project 2001-2004 - EU DataGrid project –middleware & testbed for an operational grid 2002-2005 – LHC Computing Grid – LCG –deploying the results of DataGrid to provide a production facility for LHC experiments 2004-2006 – EU EGEE project phase 1 –starts from the LCG grid –shared production infrastructure –expanding to other communities and sciences 2006-2008 – EU EGEE-II –Building on phase 1 –Expanding applications and communities … … and in the future – Worldwide grid infrastructure?? –Interoperating and co-operating infrastructures? CERN

4 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 4 The EGEE project EGEE - €32 M –1 April 2004 – 31 March 2006 –71 partners in 27 countries, federated in regional Grids EGEE-II - €35 M –1 April 2006 – 31 March 2008 –91 partners in 32 countries –13 Federations Objectives –Large-scale, production-quality infrastructure for e-Science –Attracting new resources and users from industry as well as science –Improving and maintaining “gLite” Grid middleware

5 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 5 The EGEE Infrastructure Certification testbeds (SA3) Pre-production service Production service Test-beds & Services Operations Coordination Centre Regional Operations Centres Global Grid User Support EGEE Network Operations Centre (SA2) Operational Security Coordination Team Support Structures Operations Advisory Group (+NA4) Joint Security Policy GroupEuGridPMA (& IGTF) Grid Security Vulnerability Group Security & Policy Groups Infrastructure: Physical test-beds & services Support organisations & procedures Policy groups

6 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 6 Certification & release preparation The goal is to produce a middleware distribution that can be deployed widely –Not the same as middleware releases from development projects –More like a Linux distribution – bringing together many pieces from several sources Extensive certification test-bed: –Close to 100 machines involved, CERN + partners Emulate the main deployment environments Certification testing: –Installation and configuration –Component (service) functionality –System testing (trying to emulate real workloads and stress testing) –Beginning to use virtualization to simplify the testing environment Deployment into the pre- production system –Final step of certification – validation by real sites –Validation by applications – also allows to prepare apps for new versions

7 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 7 Pre-production service Pre-production service is now ~ 20 sites Provides access to some 500 CPU –Some sites allow access to their full production batch systems for scale tests Sites install and test different configurations and sets of services –Try to get good feeling for the quality of the release or updates before general release to production –Feedback to: certification, integration, developers, etc. P-PS is now used in the way it was intended –For some time it was acting as a second certification test-bed for the gLite- 1.x branch –Some services may be demonstrated in this environment before going to production (or they may need more work)

8 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 8 Production service sites Size of the infrastructure today: 196 sites in 42 countries ~32 000 CPU ~ 3 PB disk, + tape MSS CPU

9 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 9 Usage of the infrastructure >50k jobs/day ~7000 CPU-months/month

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 10 Non-LHC VOs Workloads of the “other VOs” start to be significant – approaching 8- 10K jobs per day; and 1000 cpu-months/month one year ago this was the overall scale of work for all VOs Workloads of the “other VOs” start to be significant – approaching 8- 10K jobs per day; and 1000 cpu-months/month one year ago this was the overall scale of work for all VOs

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 11 Use of the infrastructure 20k jobs running simultaneously

12 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 12 CPU Usage Virtual Organizations Jan. ’06 Sep. ’06

13 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 13 Use for massive data transfer Large LHC experiments now transferring ~ 1PB/month each

14 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 14 Applications on EGEE More than 25 applications from an increasing number of domains –Astrophysics –Computational Chemistry –Earth Sciences –Financial Simulation –Fusion –Geophysics –High Energy Physics –Life Sciences –Multimedia –Material Sciences –….. Application types: Simulation Bulk Processing Responsive Apps. Workflow Parallel Jobs Legacy Applications

15 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 15 Simulation Examples –LHC Monte Carlo simulation –Fusion –WISDOM—malaria/avian flu Characteristics –Jobs are CPU-intensive –Large number of independent jobs –Run by few (expert) users –Small input; large output Needs –Batch-system services –Minimal data management for storage of results ATLAS ITER

16 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 16 Drug Discovery WISDOM focuses on in silico drug discovery for neglected and emerging diseases. Malaria — Summer 2005 –46 million ligands docked –1 million selected –1TB data produced; 80 CPU-years used in 6 weeks Avian Flu — Spring 2006 –H5N1 neuraminidase –Impact of selected point mutations on eff. of existing drugs –Identification of new potential drugs acting on mutated N1 Fall 2006 –Extension to other neglected diseases

17 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 17 Bulk Processing Examples –HEP processing of raw data, analysis –Earth observation data processing Characteristics –Widely-distributed input data –Significant amount of input and output data Needs –Job management tools (workload management) –Meta-data services –More sophisticated data management

18 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 18 Responsive Apps. (I) Examples –Prototyping new applications –Monitoring grid operations –Direct interactivity Characteristics –Small amounts of input and output data –Not CPU-intensive –Short response time (few minutes) Needs –Configuration which allows “immediate” execution (QoS) –Services must treat jobs with minimum latency

19 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 19 Responsive Apps. (II) Grid as a backend infrastructure: –gPTM3D: interactive analysis of medical images –GPS@: bioinformatics via web portal –GATE: radiotherapy planning –DILIGENT: digital libraries –Volcano sonification Characteristics –Rapid response: a human waiting for the result! –Many small but CPU-intensive tasks –User is not aware of “grid”! Needs –Interfacing (data & computing) with non-grid application or portal –User and rights management between front-end and grid

20 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 20 Workflow Examples –“Bronze Standard”: image registration –Flood prediction Characteristics –Use of grid and non-grid services –Complex set of algorithms for the analysis –Complex dependencies between individual tasks Needs –Tools for managing the workflow itself –Standard interfaces for services (I.e. web-services)

21 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 21 Parallel Jobs Examples –Climate modeling –Earthquake analysis –Computational chemistry Characteristics –Many interdependent, communicating tasks –Many CPUs needed simultaneously –Use of MPI libraries Needs –Configuration of resources for flexible use of MPI –Pre-installation of optimized MPI libraries

22 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 22 Legacy Applications Examples –Commercial or closed source binaries –Geocluster: geophysical analysis software –FlexX: molecular docking software –Matlab, Mathematics, … Characteristics –Licenses: control access to software on the grid –No recompilation  no direct use of grid APIs! Needs –License server and grid deployment model –Transparent access to data on the grid

23 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 23 Grid management: structure Operations Coordination Centre (OCC) –management, oversight of all operational and support activities Regional Operations Centres (ROC) –providing the core of the support infrastructure, each supporting a number of resource centres within its region –Grid Operator on Duty Resource centres –providing resources (computing, storage, network, etc.); Grid User Support (GGUS) –At FZK, coordination and management of user support, single point of contact for users

24 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 24 Grid Monitoring Goal: –Proactively monitor operational state & performance of the grid –Trigger corrective actions at sites, ROCs, service managers Many tools used: –Distributed responsibility for tools maintenance and operation –Operator portal, Info sys monitor, SFT/SAM, job monitors, etc. Site Functional Tests (SFT)  Site Availability Monitor (SAM) –Framework to sample/test services at sites and publish results –Can include ad-hoc tests (e.g. VO-specific) in the framework or externally –Allows dynamic look-up by VO of sites that are currently OK for them –SAM: extends the concept to measure service availability –Web service access to the data –Intend to use this to generate trouble tickets and alarms Primary tools of the operator on duty are –Information system monitoring and SFT/SAM

25 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 25 Site metrics - availability

26 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 26 Support - GGUS

27 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 27 The EGEE Network Operations Centre Creating a “Network Support unit” in the EGEE operational model; Tasks: –Receive tickets from NRENs, and forward to GGUS if impact on grid –Receive tickets from GGUS if a network issue –Troubleshoot & follow up with sites or NRENs GGUS Users Support Units ENOC NRENs GÉANT2 EGEENetwork

28 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 28 Interoperation Interoperability and interoperation (or co-operation) EGEE has interoperability activities with: (enabling the middlewares to work together) –Open Science Grid (U.S.) – quite far advanced –Nordugrid (ARC) – task in EGEE-II, 4 workshops and ongoing activity –UNICORE – task in EGEE-II –NAREGI (Japan) – 1 workshop, continued activity –GIN (OGF) – active in several areas EGEE has interoperation activities with: (enabling the infrastructures to co-operate) –Open Science Grid – actually in use –Anticipated with NorduGrid (NDGF) for WLCG

29 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 29 Interoperating information systems EGEE OSG Naregi Teragrid Pragma Nordugrid

30 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 30 Related infrastructure projects DEISA TeraGrid Coordination in SA1 for: EELA, BalticGrid, EUMedGrid, EUChinaGrid, SEE-GRID Interoperation with OSG, NAREGI SA3 : DEISA, ARC, NAREGI

31 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 31 Sustainability: Beyond EGEE-II Need to prepare for permanent Grid infrastructure –Maintain Europe’s leading position in global science Grids –Ensure a reliable and adaptive support for all sciences –Independent of short project funding cycles –Modelled on success of GÉANT  Infrastructure managed in collaboration with national grid initiatives

32 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Ian.Bird@cern.chHEPiX; JLab; 9 th -13 th October 2006 32 Summary of status Today we have an operating production infrastructure –Probably the largest in the world, supporting many science domains –Relied upon by several as their primary source of computing We have a managed operations process addressing most areas –Constantly evolving Inter/Co-operation is a fact and is becoming more important very quickly –Several applications need to work across grids – and they need support for that A large fraction of the value of the operations activity is in the intangibles – processes, structures, expertise, etc. We recognise that there are many outstanding problems with the current state of things: reliability and robustness are the focus for the next year


Download ppt "EGEE-II INFSO-RI-031688EGEE and gLite are registered trademarks The EGEE Production Grid Ian Bird EGEE Operations Manager HEPiX Jefferson Lab, 12 th October."

Similar presentations


Ads by Google