Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.

Similar presentations


Presentation on theme: "Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack."— Presentation transcript:

1 Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack Summit Austin

2 Agenda NFV Requirement [Ashiq] 8 min Implementation in OpenStack –High-level architecture for fault management [Ryota] 8 min –Resource state awareness [Tomi] 8min –Congress-based Inspection [Masahito] 10min –Demo [Ryota] 6min 2

3 NFV Requirement 3

4 Telco Requirements Mobile network requires high service availability 4 BTS Mobility Management Entity (MME) Each of these nodes hosts few thousands subscriber sessions - if down - all mobile phones will be disconnected - consequently will try to reconnect simultaneously - creating an ‘Attach’ storm - leading to further congestion/failure Failure recovery needs to be performed in sub-second order control data Global Datapath Gateway (P- GW) Local Datapath Gateway (S-GW) Local Datapath Gateway (S-GW)

5 Functional requirements Speedy failure detection and notification to the users –User could be a VNF Manager (VNFM) 5 Virtualized Infrastructure Manager (VIM) Hardware Hypervisor VNF (ACT) VNF (SBY) VNF Manager Who should I inform? A VIM (OpenStack) shall detect a failure event, find out appropriate users affected by the failure, and then notify the users VNF (ACT)

6 What is “failure”? Depends on –Applications (VNFs) –Back-end technologies used in the deployment –Redundancy of the equipment/components –Operator Policy –Regulation So, “failure” has to be configurable 6

7 High-level architecture for Fault Management 7

8 High Level Architecture of NFV 8 Virtualized Infrastructure Applications Application Manager (VIM User) Virtualized Infrastructure Manager (VIM) Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App

9 Fault Management Flow 9 Virtualized Infrastructure Applications Application Manager (VIM User) Virtualized Infrastructure Manager (VIM) Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Detection Reaction Our Focus

10 Fault Management Functional Blocks and Sequence 10 Monitor Notifier Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Fault Inspector 4. Notify all 4. (alt) Notify 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor

11 Fault Management Functional Block Mapping 11 Monitor Notifier Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Fault Inspector 4. Notify all 4. (alt) Notify 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Cinder Neutron Nova Ceilometer+Aodh Vitrage Congres s

12 Challenges in OpenStack Introducing Fault Management sequence across multiple OpenStack projects Letting OpenStack users know corresponding resource state properly/immediately Building correlation mechanism to support various OpenStack deployment flavors and operator policies 12

13 Blueprints in Liberty/Mitaka Cycles ProjectBlueprintSpec DrafterDeveloperStatus Ceilomete r/Aodh Event Alarm Evaluator Ryota Mibu (NEC) Completed (Liberty) Nova New nova API call to mark nova- compute down Tomi Juvonen (Nokia) Roman Dobosz (Intel) Completed (Liberty) Support forcing service down Tomi Juvonen (Nokia) Carlos Goncalves (NEC) Completed (Liberty) Get valid server state Tomi Juvonen (Nokia) Completed (Mitaka) Add notification for service status change Balazs Gibizer (Ericsson) Completed (Mitaka) CongressPush Type DataSource Driver Masahito Muroi (NTT) Completed (Mitaka) 13

14 Virtualized Infrastructure 4. (alt) Notify Development in OpenStack Monitor Notifier Manager Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Fault Inspector 4. Notify all 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Cinder Neutron Nova Vitrage Congres s 14 Ceilometer+Aodh Resource State Awareness Congress-based Inspection

15 Resource State Awareness 15

16 Nova – Force-Down and Exposing host_state Host / Machine Hypervisor VM nova compute nova api nova conductor nova scheduler nova DB queue External Monitoring Service vSwitch BMC EXISTING (periodic update) Force-down API [Mark Nova-Compute Down] Notifying that the nova service is no longer available Client Service disable API Evacuation API Reset Server State API Allows Nova to integrate with External Monitoring Services, and make sure Nova handles requests for the host properly service state

17 Ceilometer - Event Alarm sample Notification-driven alarm evaluator NEW Shortcut (notification-based) EXISTING (polling-based) Manager Audit Service stats notification event CinderNeutronNova

18 Host / Machine Doctor BP Detail: Nova – Get valid server state ‘host_state’ can have values: UP if nova-compute is up. UNKNOWN if nova-compute not reported by service group driver. DOWN if nova-compute is forced down. MAINTENANCE if nova-compute is disabled. Empty string indicates there is no host for server. This attribute appears in the response only if the policy permits. Default is for admin, but in NFV case also owner should be enabled. nova api nova DB Force-down API Server API Server APIs to have ‘host_state’: GET /v2.1/ ​ {tenant_id} ​ /servers/detail GET /v2.1/ ​ {tenant_id} ​ /servers/ ​ {server_id} ​ nova compute nova conductor nova scheduler queue periodic update Service disable API service state

19 Congress-based Inspection 19

20 What is Congress? Governance as a Service –Define and enforce policy for Cloud Services Policy –No single definition Law/Regulations Business Rules Security Requirements Application Requirements –Any Service, any Policy 20

21 Congress Architecture 21 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement

22 Requirements and gaps for Congress as Inspector RequirementsCongress FeaturesGaps Fast Failure NotificationPeriodical polling and policy enforcement Real-time policy enforcement Mapping of a physical failure to a logical failure Write a rule for mappingNone AdaptabilityChange Policy rulesNone 22

23 Congress PushType DataSource Driver 23 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement New data flow PushType DataSourceDriv er Another Service Enables services outside Congress to push data, and improves reaction time for policy enforcement

24 Congress Doctor Driver 24 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement New data flow Doctor DataSourceDriv er Monitor 1. 1.Monitor notifies hardware failure event to Congress 2.Doctor Driver receives failure event, insert it to event list of Doctor Data 3.Policy Engine receives the failure event, then evaluate registered policy and enforce state correction 4.Policy Engine instruct Nova Driver to perform host service force down and reset state of VM(s) 2. 3.4.

25 Congress Doctor Driver (Detail) Driver Schema ( HW failure example ) +--------+-----------------------------------------------------+ | table | columns | +--------+-----------------------------------------------------+ | events | {'name': 'id', 'description': 'None'}, | | | {'name': 'time', 'description': 'None'}, | | | {'name': 'type', 'description': 'None'}, | | | {'name': 'hostname', 'description': 'None'}, | | | {'name': 'status', 'description': 'None'}, | | | {'name': 'monitor', 'description': 'None'}, | | | {'name': 'monitor_event_id', 'description': 'None'} | +--------+-----------------------------------------------------+ Event List of Doctor Data +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ | id | time | type | hostname | status | monitor | monitor_event_id | +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ | 0123-4567-89ab | 2016-03-09T07:39:27.230277464 | host.nic1.down | demo-compute0 | down | demo_monitor | 111 | +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ 25

26 Demo 26

27 Demo Scenarios Scenario 1. one of redundant NIC ports down  failure Scenario 2. set of redundant NIC ports down  failure Sequence 1.NIC Port(s) Down 2.Detect and notify the failure –Monitor detects the HW failure event and notifies it to Congress –Congress updates state of effected VMs to error –Ceilometer and Aodh notify the VM’s failure to app manager 3.Healing –The app manager switches active-standby, so that the service can continue 27 Controller (Nova, Congress, etc.) Compute1 Switch End User Video Serer (ACT) Compute2 Video Serer (SBY) Switch (Mgmt) Router

28 Conclusions Resource state awareness has been improved by state correction API enhancement, immediate notification to user and exposing host state flexible inspection is available with Congress Fault event API is opening up the way to support various backend technologies 28


Download ppt "Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack."

Similar presentations


Ads by Google