Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC.

Similar presentations


Presentation on theme: "1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC."— Presentation transcript:

1 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

2 2 Doctor Overview One of OPNFV Requirement Project (Identify requirement, Gap Analysis, Implementation Study) Goal –Build fault management and maintenance framework for high availability of Network Services on top of virtualized infrastructure –Valuable and acceptable framework for other industries Status –Initial Requirement study, architecture design, Gap analysis : Done (See Document [link]) –Collaborative Development: Started (Blueprints are proposed to Nova and Ceilometer) –Standardization Sync: On-going (by NFV member efforts, joint meeting)

3 3 Use Case 1: Fault management

4 4 Use Case 2: Maintenance

5 5 High Level Architecture Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) = OpenStack Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App

6 6 Fault Management Sequence Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) = OpenStack Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Detection Reaction Doctor Initial Focus

7 7 Key Requirements as VIM Immediate notification to VIM user and administrator Fault notification of affected virtual resources (Correlation) Configurable notification by VIM admin and user (Pub/Sub) Catch all faults in NFVI (pluggability for various technologies and future extentions)

8 8 Key Requirements as VIM Immediate Notification Consistent Resource State Awareness Extensible Monitoring Fault Correlation

9 9 TO-BE: Functional Blocks Virtualized Infrastructure Applications VIM User and Administrator VIM Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Notifier Monitor Controller Inspector

10 10 Fault Management Scenarios (1/2) Monitor Notifier User-side Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify Admin-side Manager 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor

11 11 Fault Management Scenarios (2/2) Monitor Notifier User-side Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify Admin-side Manager 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor

12 12 AS-IS: OpenStack Kilo (1/3) How can you find faults as a tenant user? –Keep-a-live check to each VM –Polling VM state to Nova API –Set alarm on metering service (e.g. CPU runtime)

13 13 AS-IS: OpenStack Kilo (2/3) How does the metering service work? 1.Resource controller such as Nova monitors usage of resource [Periodically] 2.Get samples from resource controller and register them to DB [Periodically] 3.Evaluate alarm definition on samples [Periodically] 4.Raise alarm depend on result of the evaluation Machine Hypervisor VM NovaCeilometer(Heat) Samples 1. 2. 3. 4.

14 14 AS-IS: OpenStack Kilo (3/3) Notification –OpenStack components post events to messaging queue –Ceilometer collects, transform and publish those events which can be used for billing 14 NFVI NeutronCeilometer(Billing) Samples Nova Cinder Queue

15 15 Implementation Plan in OpenStack 15 Ceilomter Virtualized Infrastructure Applications Zabbix VIM User and Administrator Error Injection Plugin ? Event Alarm Immediate Notification Queue Inspector Nova

16 16 Demo (1/3) User Scenario Web Server Load Balancer HTTP Clients Public Net Private Net Launch New VM

17 17 Demo (2/3) Demo 1 Demo 2 Machine Hypervisor VM Nova Ceilometer (Heat) Samples 1. Collect CPU time samples 2. Alarm Heat if CPU runtime = 0 3. Create New Web Server 1. Hook 3. Alarm Heat AgentAlarm 2. Notify as Event Machine Hypervisor VM Nova Ceilometer (Heat)AgentAlarm

18 18 Demo (3/3) Results Demo 1 Demo 2 90 sec 26 sec

19 19 Doctor Southbound API User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor Configuration Fault Messaging Unified Event API Monitor Threshold Enable

20 20 Case 1: Obvious Fault User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor ZabbixBMC (Inspector ) Nova Ceilomete r User Configuration Fault Messaging SNMP Trap (Power-off) HTTP POST (Host A down) HTTP POST (Host A down, VM A1-A3 down) HTTP POST (VM A1 down) HTTP POST (Alert: VM A1 down) HTTP POST (Create Alarm) Enable

21 21 Case 2: Threshold Exceeded Fault (Admin Config) User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor Zabbix Monitor Agent (Inspector ) Nova Ceilomete r User Configuration Fault Messaging HTTP POST (Switch down) HTTP POST (Host A down, VM A1-A3 down) HTTP POST (VM A1 down) HTTP POST (Alert: VM A1 down) HTTP POST (Create Alarm) Threshold Enable vSwitch collectd Admin Threshold

22 22 Backup

23 23 Fault Management Sequence (Optional) Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) = OpenStack Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Auto Reaction Detection Reaction

24 24 Fault Management Scenarios (Optional) Monitor Notifier User-side Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify Admin-side Manager 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Auto Reaction Monitor

25 25 Configuration / Policy Enforcement 25 User NFVI Conf. Polic y InspectorNotifier Admin Policy Service Conf. Monitor Configuration Fault Messaging Option 1: Policy Service Integration Option 2: Using Metadata in Controller Metadata Threshold Enable Metadata Controller Policy Threshold Enable

26 26 Case 3: Threshold Exceeded Fault (User Config) 26 User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor Zabbix Monitor Agent (Inspector ) Nova Ceilomete r User Configuration Fault Messaging HTTP POST (Switch down) HTTP POST (Host A down, VM A1-A3 down) HTTP POST (VM A1 down) HTTP POST (Alert: VM A1 down) HTTP POST (Create Resource with Policy Label) vSwitch collectd Admin Policy Service Enable Threshold Enable Threshold Policy Congress HTTP POST (Set Policy) HTTP POST (Data) Metadata


Download ppt "1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC."

Similar presentations


Ads by Google