EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.

Slides:



Advertisements
Similar presentations
INFSO-RI Enabling Grids for E-sciencE Operational Security OSCT JSPG March 2006 Ian Neilson, CERN.
Advertisements

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE-III Program of Work Erwin Laure EGEE-II / EGEE-III Transition Meeting CERN,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Romanian SA1 report Alexandru Stanciu ICI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks PoW for the second year Transition to EGI.
EGI: SA1 Operations John Gordon EGEE09 Barcelona September 2009.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse EGEE’s plans for transition.
The EGI Blueprint: Grid Operations and Security Migration to the next grid operations era Tiziana Ferrari (Istituto Nazionale di Fisica Nucleare)
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE – paving the way for a sustainable infrastructure.
EGI_DS or “can WLCG operate after EGEE?” Jamie Shiers ~~~ WLCG GDB, May 14 th 2008.
Enabling Grids for E-sciencE 1 EGEE III Project Prof.dr. Doina Banciu ICI Bucharest GRID activities in RoGrid Consortium.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Monitoring and enforcement of Service Level Agreements John Shade EGEE-II / EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Bazaar Vision Ideas of RC/VO coordination,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse Technical Director CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA2 Quality Plan for EGEE III Geneviève.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Service Availability Monitoring – Status.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE II: an eInfrastructure for Europe and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bob Jones EGEE project director CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EGEE-III-INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-III All Activity Meeting Brussels,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Gergely Sipos Activity Deputy Manager MTA.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Antonio Retico CERN, Geneva 19 Jan 2009 PPS in EGEEIII: Some Points.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Pre-production in EGEEIII Operation principles Antonio Retico EGEE-II / EGEE II SA1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse Technical Director CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks DSA1.4 – Objectives and Status Ioannis Liabotis.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGI Operations Tiziana Ferrari EGEE User.
EMI INFSO-RI EMI Roadmap to Standardization and DCI Collaborations Alberto Di Meglio (CERN) Project Director.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Vassiliki Pouli
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse (substituting for Maite.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Robin McConnell NA3 Activity Manager 02.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
Ian Bird LCG Project Leader On the transition to EGI – Requirements from WLCG WLCG Workshop 24 th April 2008.
WLCG Laura Perini1 EGI Operation Scenarios Introduction to panel discussion.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks John Gordon SA1 Face to Face CERN, June.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Resource Allocation in EGEEIII Overview &
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
EGEE is a project funded by the European Union under contract IST Roles & Responsibilities Ian Bird SA1 Manager Cork Meeting, April 2004.
WLCG WLCG Collaboration Workshop 21 – 25 April 2008, CERN ¿Future Operations?
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks User Support for Distributed Computing Infrastructures.
Grid Operations in EGI / NGIs The EGI mw function Panagiotis Louridas, GRNET Tomasz Szepieniec, CYFRONET Report from the 1st Session Rome, March.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC FR - On the way to the EGI/NGI structure.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA2 Networking support for EGEE III Xavier.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse Technical Director CERN.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
Resource Provisioning EGI_DS WP3 consolidation workshop, CERN Fotis Karayannis, GRNET.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
EMI INFSO-RI Testbed for project continuous Integration Danilo Dongiovanni (INFN-CNAF) -SA2.6 Task Leader Jozef Cernak(UPJŠ, Kosice, Slovakia)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid is a Bazaar of Resource Providers and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
INFSO-RI Enabling Grids for E-sciencE EGEE general project update Fotis Karayannis EGEE South East Europe Project Management Board.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
INFSO-RI Enabling Grids for E-sciencE GOCDB2 Matt Thorpe / Philippa Strange RAL, UK.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IT ROC: Vision for EGEE III Tiziana Ferrari.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyEGEE David Horat (
Ian Bird GDB Meeting CERN 9 September 2003
Maite Barroso, SA1 activity leader CERN 27th January 2009
Nordic ROC Organization
Presentation transcript:

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting James Casey CERN. 6-7 May 2008 (Slides from WLCG Workshop)

Enabling Grids for E-sciencE EGEE-III INFSO-RI WLCG Collaboration workshop, CERN, 22 April ‘08 2 EGEE-III Introduction What is EGEE III? A continuation of the EGEE program, building on its achievements and preparing the transition towards a sustainable infrastructure Goals: –EGEE-III will work with EGI to transfer experience in operating Grid infrastructures, ensuring the development of a viable model. Based on the plans to be produced by EGI, EGEE-III will start implementing the changes for a transition to the EGI model –To maintain, enhance and simplify the use of the production quality computing infrastructure for an increasing range of researchers in diverse scientific fields

Enabling Grids for E-sciencE EGEE-III INFSO-RI WLCG Collaboration workshop, CERN, 22 April ‘08 3 EGEE III Operations Goals The provision of a large-scale, production Grid infrastructure that interoperates at many levels, offering reliable services to a wide range of applications –Continuation of the present service Set the groundwork for the migration to a distributed model based on coordination at the European level of National Grid Infrastructures –This is the challenge for the next 2 years, to do this without breaking the 1 st goal (continuation of reliable service) With the constraints: –2 years –Significant y less effort

Enabling Grids for E-sciencE EGEE-III INFSO-RI WLCG Collaboration workshop, CERN, 22 April ‘08 4 Centralized vs. distributed What is our present (EGEE II) model? Grid management Central coordination for all of the tasks, in many cases localised at CERN (team is called OCC: operations Coordinations Centre) Grid operations and support –In general, problem monitoring (SAM) and reporting done centrally by the COD is not well integrated with the daily operations and monitoring carried out at each site –Best effort/informal coordination of operations tool and requirements gathering. Most tools are deployed centrally (main instance run in one region serving the whole infrastructure) Support to VOs, Users, Applications –Central access point for user support, connected to all the ROCs Grid security –Central coordination, with effort from all ROCs and a broad collaboration from sites

Enabling Grids for E-sciencE EGEE-III INFSO-RI WLCG Collaboration workshop, CERN, 22 April ‘08 5 Centralized vs. distributed What is our target model? ROCs (NGIs) are responsible for day to day operations, without a central organization overseeing them. Set of operations tools supporting this Central body (OCC) responsible for coordination of cross-regional tasks Clear interfaces/targets between OCC and ROCs(NGIs), between ROCs(NGIs) and sites Sites with well developed fabric tools that monitor local and grid services in a common way and trigger alarms directly, so most of the issues are solved at this level

Enabling Grids for E-sciencE EGEE-III INFSO-RI To change: View -> Header and Footer 6 Target Operations task distribution OCC: –SLA and metric definition –Measure reliability and availability by aggregating data produced by the ROCs –Application – Resource Provider Coordination (new VOS across regions) –Coordination of operational issues that cross regional boundaries –Operations tools coordination (gather requirements, avoid duplication) ROCs/NGIs –Oversight and management of grid operations in the region –Provision of operations tools –SLA monitoring and follow up with sites –First line support (helpdesk, roll out of new releases, user support) Sites –Responsible to provide a reliable service, according to the SLA and the level of service required by the supported VOs

Enabling Grids for E-sciencE EGEE-III INFSO-RI WLCG Collaboration workshop, CERN, 22 April ‘08 7 How to get there: general Define work plans for all tasks Questions we need to answer for all the tasks: –status of each task at the end of EGEE III –How will it be run? Centrally, distributed to all sites/NGIs/ROCs? –How will it be managed? Does it need central control in a central organisation, irrespective of how it will run? Can it be run by a group of peers coordinating without central control? Does it need coordination at all? –How will it be funded? Does it need central funding? National funding? VO funding? –Will it be stable? Or will it still have a plan for future development? For example one might plan a service that will still need central management at the end of EGEE III but could be completely distributed one year later. When we have answered these questions there will be two benefits: –We will see the constituent parts of SA1 that need the most change during EGEE III. This will let us write more detailed project plans for them. –We will have a view of the state of the infrastructure of EGEE III that will transition into an EGI. The EGI-DS desperately needs this input. So do the NGIs as they need to plan their future and will need to have a view on what they are being asked to take (from John Gordon’s proposal for Operations transition meeting next week)

Enabling Grids for E-sciencE EGEE-III INFSO-RI WLCG Collaboration workshop, CERN, 22 April ‘08 8 How to get there: OAG Operations Automation Group (chair: James Casey): Improve site reliability by wider deployment of fabric management tools at sites Devolve central monitoring systems, where possible, to regional systems Create architecture for new shared infrastructure required to support the operational tools Measure and improve the availabity and reliability of the operational tools themselves Design SLA compliance tools (availability and reliability calculation) Collection of usage and accounting information for CPU/Disk/Network Provide vizualization of the state of infrastructure for site administration, regional operators and project managers Provide reporting tools for the OCC and project management

Enabling Grids for E-sciencE EGEE-III INFSO-RI WLCG Collaboration workshop, CERN, 22 April ‘08 9 How to get there: COD New mandate for the Grid operator on Duty, COD (chair: Helene Cordier), being defined, to: Move from central COD model to regional COD model; goal: central COD as we understand it today does not exist at the end of EGEE III; all ROCs being responsible for the COD tasks by the end of the project Define what a “thin central COD layer” would do, and if needed introduce it in the new COD model Coordination between regional CODs and central thin layer COD Share of expertise between regional CODs

Enabling Grids for E-sciencE EGEE-III INFSO-RI WLCG Collaboration workshop, CERN, 22 April ‘08 10 How to get there: SLAs Definition, monitoring and enforcement of Service Level Agreements (chair: John Shade) Measure service level in view of improving it –EC review comment: “The measures of robustness and reliability of the production infrastructure are still very rudimentary.” Formalize the responsibilities of both parties –Avoid misunderstandings –Improve relationships between both parties Understand what must be supplied Understand what is the minimum acceptable Identify service parameters –Availability, Reliability, Ticket response times

Enabling Grids for E-sciencE EGEE-III INFSO-RI WLCG Collaboration workshop, CERN, 22 April ‘08 11 Conclusion Moving to a fully distributed model; we have some experience with this, EGEE operations is partially distributed already Challenge to do this with less effort and in 2 years; no place for duplication, loose initiatives Collaboration is essential; we need an agreed vision as input to EGI, and we need to work together towards this vision Site responsibility for daily operations is the best way of saving effort and simplifying operations at all levels! –We need to provide the tools to facilitate this –We need more site involvement in Working Groups –Site and ROC/NGI partnership should be reinforced