Presentation is loading. Please wait.

Presentation is loading. Please wait.

MyOps An Operational Framework for PlanetLab Deployments 1.

Similar presentations


Presentation on theme: "MyOps An Operational Framework for PlanetLab Deployments 1."— Presentation transcript:

1 MyOps An Operational Framework for PlanetLab Deployments 1

2 Outline o Objective of MyOps o Current status o Future ideas o Questions at any time 2

3 Example of Feedback 3

4 Objective : Close Operational Cycle System - Provides service (slice) Monitoring - Feedback from running system Operator - Interpret feedback into tasks Management - Control running system 4

5 Challenges: Break-down System may not deliver service Monitoring not observe useful metrics Operator may not know o how to interpret observations o how to control the system o what the service goals are Management may not control system 5

6 Requirements for Operational Systems Satisfy Minimal Conditions 1. Physical Integrity 2. Interconnectivity 3. Controllable 4. Provide a Service Two requirements o Reliably reach the final condition o When failures occurs, repair or report automatically Two approaches in MyOps o Precise bootstrap stages (not discussed) o Operational monitoring & management in platform 6

7 System: PlanetLab Slices 7

8 Monitoring Types Open-loop monitoring Identify the unknown More information, fine-grained Operational monitoring (closed-loop) Correctness Less information, coarse-grained Actionable 8

9 Management Types Open-loop management Bootstrap/Deploy from the ground up Inefficient, coarse-grained No feed-back Operational management (closed-loop) Tweak the system to correct behavior More efficient, fine-grained 9

10 Example Observe: Node is Off-Line Control: Attempt to Power-On Observe: Node is On-line but Failed to boot Observe: Failed to boot Error Control: Create ticket & Send email to local contact Time passes Control: Disable slice creation Observe: Local contact responds Observe: Node is Power-on and Running Control: Re-enable slice creation Contro: Close ticket 10

11 History of PlanetLab Operations Open-loop Monitoring with Open-loop Management Collect fine-grained statistics using CoMon Act with coarse-grained operations (e.g. Reinstall) Manual bridge between the two Moving towards Closed-loop Operations Collect targeted metrics Take directed, problem-specific actions Automate actions based on policy 11

12 PlanetLab Operations Close the monitor/management cycle Direct automation of common operations Indirect through remote contacts and incentives 12

13 MyOps Architecture Collection from Node Translated by policy to Automated action 13

14 MyOps Architecture Collection from Node Send notice to Local contact to take action 14

15 MyOps Architecture When there is no response Indirect influence with incentives 15

16 Collection Operational monitoring specific targets, such as: o Boot status, Filesystem status o DNS - internal and external o RPMs o System services, etc Periodic collection o Coarse-grained collection at a human-timescale o Time-series of events and status 16

17 Policy Constraints over a time-series of events To satisfy a constraint o Automated action o Send notice o Apply incentive Policy defines o Preferred status of system o Frequency of actions o Magnitude of incentives 17

18 Automation Automatic correction of common bootstrap problems o Communication errors with MyPLC o Corrupt filesystem repair o Retry when state is unknown o PCU Reboot o Reinstall Automation Notices o Bad disk o Minimal hardware o Bad DNS o Bad node configuration 18

19 Notices & Incentives Notices are indirect paths to node management o Node down / online / specific problem (i.e. DNS, disk) o Site down / online o Privilege reduced / restored o PCU errors The incentives on MyPLC o Sites 10 slices o Disable slice creation o Disable running slices 19

20 Validation of Notices & Incentives ABCDE Notice BugFixKernel BugFix Fix2 20

21 Time to Restore Down Node (all issues) 21

22 Future Ideas Generalize Configuration Collect from multiple sources Expose policy Act on multiple targets Self-monitoring Positive Incentives Special access to services Additional resources (Slices, Bandwidth, CPU, etc) 22

23 Time to Reply (when there is a reply) 23


Download ppt "MyOps An Operational Framework for PlanetLab Deployments 1."

Similar presentations


Ads by Google