1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.

Slides:

Advertisements

Similar presentations

System Center 2012 R2 Overview

Advertisements

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.

HEPiX Virtualisation Working Group Status, July 9 th 2010

An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu

SOFTWARE AS A SERVICE PLATFORM AS A SERVICE INFRASTRUCTURE AS A SERVICE.

Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.

CLOUD COMPUTING & COST MANAGEMENT S. Gurubalasubramaniyan, MSc IT, MTech Presented by.

Assessment of Core Services provided to USLHC by OSG.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

1 port BOSS on Wenjing Wu (IHEP-CC)

+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.

YAN, Tian On behalf of distributed computing group Institute of High Energy Physics (IHEP), CAS, China CHEP-2015, Apr th, OIST, Okinawa.

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.

The Data Bridge Laurence Field IT/SDC 6 March 2015.

From Virtualization Management to Private Cloud with SCVMM 2012 Dan Stolts Sr. IT Pro Evangelist Microsoft Corporation

Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.

BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.

1 Resource Provisioning Overview Laurence Field 12 April 2015.

Cloud Status Laurence Field IT/SDC 09/09/2014. Cloud Date Title 2 SaaS PaaS IaaS VMs on demand.

Virtual Workspaces Kate Keahey Argonne National Laboratory.

VO-Ganglia Grid Simulator Catalin Dumitrescu, Mike Wilde, Ian Foster Computer Science Department The University of Chicago.

Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.

1 The Adoption of Cloud Technology within the LHC Experiments Laurence Field IT/SDC 17/10/2014.

Virtualised Worker Nodes Where are we? What next? Tony Cass GDB /12/12.

Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets US-North Central Region FC TOR PDU Servers TOR PDU Servers TOR PDU Servers TOR PDU.

Microsoft Management Seminar Series SMS 2003 Change Management.

NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.

2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.

Workload management, virtualisation, clouds & multicore Andrew Lahiff.

Web Technologies Lecture 13 Introduction to cloud computing.

D. Giordano Second Price Enquiry Update L. Field On behalf of the IT-SDC WLC Resource Integration Team.

APEL Cloud Accounting Status and Plans APEL Team John Gordon.

Testing CernVM-FS scalability at RAL Tier1 Ian Collier RAL Tier1 Fabric Team WLCG GDB - September

HEPiX Virtualisation Working Group Status, February 10 th 2010 April 21 st 2010 May 12 th 2010.

Building Cloud Solutions Presenter Name Position or role Microsoft Azure.

WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.

Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.

Accounting Review Summary from the pre-GDB related to CPU (wallclock) accounting Julia Andreeva CERN-IT GDB 13th April

Accounting John Gordon WLC Workshop 2016, Lisbon.

WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Questionnaires to Cloud technology providers and sites Linda Cornwall, STFC,

Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.

Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:

StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI Work Package 5 Infrastructure.

HEPiX Virtualisation working group Andrea Chierici INFN-CNAF Workshop CCR 2010.

CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.

Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April

Review of the WLCG experiments compute plans

Laurence Field IT/SDC Cloud Activity Coordination

Use of HLT farm and Clouds in ALICE

Virtualization and Clouds ATLAS position

Dag Toppe Larsen UiB/CERN CERN,

Dag Toppe Larsen UiB/CERN CERN,

Software Architecture in Practice

ATLAS Cloud Operations

WLCG experiments FedCloud through VAC/VCycle in the EGI

How to enable computing

WLCG Collaboration Workshop;

Managing Clouds with VMM

Winter 2016 (c) Ian Davis.

Why? (or … am I really in the right track?)

Harrison Howell CSCE 824 Dr. Farkas

Presentation transcript:

1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014

2 Cloud Services Date Title 2 SaaS PaaS IaaS VMs on demand

3 High Level View Date Title 3

4 Areas  Image Management  Capacity Management  Monitoring  Accounting  Pilot Job Framework  Data Access and Networking  Quota Management  Supporting Services

5 Image Management  Provides the job environment  Software  CVMFS  PilotJob  Configuration  Contextualization  Balance pre- and post-instantiation operations  Simplicity, Complexity, Data Transfer, Frequency of Updates  Transient  No updates of running machines  Destroy (gracefully) and create new instance

6 CernVM  The OS via CVMFS  Replica via HTTP a reference file system  Stratum 0  Why?  Because CVMFS is already a requirement  Reduces the overhead of distributed image management  Manage version control centrally  CernVM as a common requirement  Availability becomes and infrastructure issue  Recipe to contextualize  Responsibility of the VO  The goal is to start a CernVM-based instance  Which needs minimal contextualization

7 Capacity Management  Managing the VM life cycle isn’t the focus  It is about ensuring the is enough resources (capacity)  Requires a specific component with some intelligence  Do I need to start of VM and if so where?  Do I need to stop a VM and if so where?  Are the VMs that I started OK?  Existing solutions focus on deploying applications in the cloud  Difference components, one cloud  May managed load balancing and failover  Is this a load balancing problem?  One configuration, many places, enough instances?  Developing our own solutions  Site centric  The VAC model  VO centric

8 Monitoring  Fabric management  The responsibility of the VO  Basic monitoring is required  The objective is to triage the machines  Invoke a restart operation if it not ok  Detection of the not ok state maybe non-trivial  Other metrics may be of interest  Spotting dark resources  Deployed but not usable  Can help to identify issues in other systems  Discovering inconsistent information through cross-checks  A Common for all VOs  Pilot jobs monitoring in VO specific

9 Provider Accounting  Helix Nebula  Pathfinder project  Development and exploitation  Cloud Computing Infrastructure  Divided into supply and demand  Three flagship applications  CERN (ATLAS simulation)  EMBL  ESA  FW: New Invoice!  Can you please confirm that these are legit?  Need to method to record usage to cross-check invoices  Dark resources  Billed for x machines but not delivered (controllable) 9

10 Consumer-Side Accounting  Monitor resource usage  Course granularity acceptable  No need to accurately measure  What, where, when for resources  Basic infrastructure level  VM instances and whatever else is billed for  Report generation  Mirror invoices  Use same metrics as charged for  Needs a uniform approach  Should work for all VOs  Deliver same information to the budget holder

11 Cloud Accounting in WLCG  Sites are the suppliers  Site accounting generates invoices  For resources used  Need to monitor the resource usage  We trust sites and hence their invoices  Comparison can detect issues and inefficiencies  Job activities in the domain of the VO  Measurement of work done  i.e. value for money  Information not included in cloud accounting  Need a common approach to provide information  Dashboard?

12 Comparison to Grid  Grid accounting = supply side accounting  No CE or batch system  Different information source  No per job information available  Only concerned about resources used  Mainly time-based  For a flavour  A specific composition of CPU, memory, disk

13 Core Metrics  Time  Billed by time per flavour  Capacity  How many were used?  Power  Performance will differ by flavour  And potentially over time  Total computing done  power x capacity x time  Efficiency  How much computing did something useful

14 Measuring Computing  Resources provided by flavour  How can we compare?  What benchmarking metrics?  And how we obtain them  Flavour = SLA  SLA monitoring  How do we do this?  Rating sites  Against the SLA  Variance

15 Data Mining Approach  VOs already have metrics on completed jobs  Can they be used to define relative performance?  Avoids dicussion on bechmarking  And how the benchmark compares to the job  May work for within the VO  But what about WLCG?  Specification for procurement?  May not accept a VO specific metric  Approach currently being investigated

16 The Other Areas  Data access and networking  Have so far focus on non-data intensive workloads  Quota Management  Currently have fixed limits  Leading the partitioning of resources between VOs  How can the sharing of resources be implemented?  Supporting Services  What else is required?  Eg squid caches in the provider  How are these managed and by who?

Summary  It is all about starting a CernVM image  And running a job agent  Similar to a pilot job  Different options are being explored  To manage the VM life cycle  To deliver elastic capacity  Already a great deal of commonality exists between the VOs  Will be exploited and built upon to provide common solutions  How resources are going to be accounted?  Counting cores is easy  Normalizing for power is hard  Investigating using job metrics  To discover relative performance  Many open questions remain 17