Presentation is loading. Please wait.

Presentation is loading. Please wait.

Laurence Field IT/SDC Cloud Activity Coordination

Similar presentations


Presentation on theme: "Laurence Field IT/SDC Cloud Activity Coordination"— Presentation transcript:

1 Laurence Field IT/SDC Cloud Activity Coordination
Clouds Laurence Field IT/SDC Cloud Activity Coordination

2 Cloud SaaS VMs on demand PaaS IaaS

3 High-level View VM WN BS Cloud Interface CE WM Instantiates Schedules
Request Job BS Cloud Interface CE Request Resource Submits Pilot WM

4 Functional Areas Image Management Capacity Management Monitoring
Accounting Pilot Job Framework Supporting Services

5 Image Management Provides the job environment
Software CVMFS Pilot Job Configuration Common Contextualization Instance or cloud specific Balance pre- and post-instantiation operations Simplicity, complexity, data transfer, frequency of updates Transient perspective No updates to the instance Destroy (gracefully) and create new instance Automated image building services

6 CernVM The OS files over CVMFS Why?
Replicates a reference file system using HTTP Stratum 0 50MB OS vs 460MB for ATLAS Why? Because CVMFS is the solution Removes the overhead of distributed version management Single place for version control CernVM is therefore a common requirement Availability becomes an infrastructure issue Recipe is the responsibility of the VO Each needs to define an internal function/role The goal is to start a CernVM-based instance And contextualize A small delta

7 Resource Management Start, Stop, Status
CernVM image Pass contextualization information Uploading images is a separate issue Need to work with different system and interfaces A connection-based approach. What is going on? Monitoring requirement Fabric monitoring how in the hands of the VO Accounting requirement Need to know how many resources were used

8 Capacity Management Managing a VM isn’t the focus
It is about ensuring the is enough resources (capacity) Requires a specific component with some intelligence Does a VM need to be started and if so where? Does a VM need stopping and if so which one? Are the VMs OK or is some action necessary? Existing solutions focus on deploying applications in the cloud Difference components, one cloud May also manage load balancing and failover Is this a load balancing problem? One configuration, many places, enough instances Develop our own solutions? Site centric The VAC model VO centric Under the control of the VO

9 Tooling VAC V-Cycle Cloud Scheduler glideinWMS AliEC2 BOINC …
Site-controlled bare metal provisioning V-Cycle VAC for OpenStack tenants Cloud Scheduler Condor-based VM control glideinWMS Glidein factory for clouds AliEC2 Lifecycle management using EC2 BOINC Extra capacity for certain classes of job

10 Monitoring What is going on? Health check
Fabric Monitoring Health check Is the VM OK? If not, terminated it and start a new one Require just enough information to triage What’s up? Ganglia Why? Because Simple, scalable, customizable, works An “O'Reilly” solution

11 Accounting and Commercial Providers
Helix Nebula Pathfinder project Exploitation of cloud computing infrastructure Divided into supply and demand CERN (ATLAS simulation) as a flagship applications A great learning experience FW: New Invoice! Is it legitimate? Cloud computing just got real! Need to cross-check invoices with usage Possible workshop in September Title Date 11

12 Consumer-Side Accounting
Monitor resource usage Course granularity acceptable No need to accurately measure What, where, when Basic infrastructure level VM instances and whatever else is billed for Report generation Mirror invoices Use same metrics as charged for Needs a uniform approach Should work for all VOs Deliver same information to the budget holder

13 Ganglia-based Accounting
Ganglia already adopted to monitor clouds Default Metrics Total CPUs (cores) Hosts (VM instances) Load (CPU efficiency) Network input/output Local disk size Resource Accounting Record Take the integral

14 Accounting Comparison

15 Cloud Accounting in WLCG
Sites are the suppliers We trust sites and hence their invoices Comparison can detect issues and inefficiencies Grid accounting = supply side accounting No CE or batch system No per job information available Only concerned about resources Mainly time-based for a flavour (specific composition) Job information in the domain of the VO Measurement of work done (usage) A common approach extract job information? Dashboard? Topic for future discussion

16 Framework and Supporting Services
Pilot Job framework In the VOs domain Streamline how the real job arrives in the VM Wrappers for wrappers Supporting Services are needed in the cloud eg. Squid, Ganglia head node, … Who should manage these? VO sysadmin team? WLCG sysadmin team? How should we manage them? Standard cloud problem

17 EGI Federated Cloud A community of providers
Defines a set of required function Common Interface (OCCI) Including x509 support Pre-uploaded images (CernVM via AppDB) Integrated Accounting (APEL and the Portal) Support network for sites Infrastructure operations

18 State of Adoption ALICE ATLAS CMS LHCb HLT   CernVM Ganglia VAC
BOINC

19 State of Adoption Document
Initial available draft August 2014 More details on this topic Overview Status of main areas Status of adoption Authors Laurence Field – Main Author Alessandro Di Girolamo – (ATLAS view) Maarten Litmaath – (ALICE view) Cristovao Cordeiro – (CMS view) Stefan Roiser – (LHCb view) Contact me if you wish to review it

20 Prototype Dashboard

21 Cloudscape

22

23

24 Next Steps Image management Capacity Management Monitoring Accounting
Identify the source of CernVM recipes within each VO Capacity Management Continue prototyping the different approaches Monitoring Integrate into production operations Accounting Verify the published values are correct Identify what metrics are actually required Streamline pilot frameworks Investigate the area of distributed system administration

25 Summary Clouds, the adoption of off-the-shelf technology
Reducing the maintenance and operational effort The objective is to run a CernVM image It’s what inside that matters not how it is started Capacity Management Exploring different options for the tooling Ganglia used for monitoring instances Repurposed for consumer-side accounting How resources are going to be accounted in WLCG Next big discussion topic Good progress so far but still much to do Moving from R&D towards production operations Scaling up


Download ppt "Laurence Field IT/SDC Cloud Activity Coordination"

Similar presentations


Ads by Google