Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.

Similar presentations


Presentation on theme: "Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack."— Presentation transcript:

1 Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack Summit Austin

2 Agenda NFV Requirements [Ashiq] Challenges in OpenStack –High-level architecture for fault management [Ryota] –Resource state awareness [Tomi] –Congress-based Inspection [Masahito] –Demo [Ryota] 2

3 NFV Requirements 3

4 Telco Requirements Mobile network requires high service availability 4 BTS Mobility Management Entity (MME) Local Datapath Gateway (S-GW) Local Datapath Gateway (S-GW) Global Datapath Gateway (P- GW) Each of these nodes hosts few thousands subscriber sessions - if down - all mobile phones will be disconnected - consequently will try to reconnect simultaneously - creating an ‘Attach’ storm - leading to further congestion/failure Failure recovery needs to be performed in milliseconds order

5 Functional requirements Speedy failure notification to the users –User could be a VNF Manager (VNFM) 5 Virtualized Infrastructure Manager (VIM) Hardware Hypervisor VNF (ACT) VNF (SBY) VNF Manager Who should I inform? A VIM (OpenStack) shall detect a failure event, find out appropriate users affected by the failure, and then notify the users VNF (ACT)

6 What is “failure”? Depends on –Application (VNF) running on –Backend Technologies used in the deployment –Redundancy of the equipment/components –Operator Policy –Regulation So, “failure” has to be configurable 6

7 High-level architecture for Fault Management 7

8 High Level Architecture for NFV 8 Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App

9 Fault Management Paths 9 Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Detection Reaction Our Focus

10 Fault Management Sequence 10 Monitor Notifier Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor

11 Fault Management Sequence 11 Monitor Notifier Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Nova Host A down nova-compute down Server 1 down

12 Challenges in OpenStack Introducing Fault Management sequence across multiple OpenStack projects Letting OpenStack users know his/her resource state properly/immediately Building correlation mechanism to support various OpenStack deployment flavor and operator policies 12

13 Blueprints in Liberty/Mitaka Cycles ProjectBlueprintSpec DrafterDeveloperStatus Ceilomete r/Aodh Event Alarm Evaluator Ryota Mibu (NEC) Completed (Liberty) Nova New nova API call to mark nova- compute down Tomi Juvonen (Nokia) Roman Dobosz (Intel) Completed (Liberty) Support forcing service down Tomi Juvonen (Nokia) Carlos Goncalves (NEC) Completed (Liberty) Get valid server state Tomi Juvonen (Nokia) Completed (Mitaka) Add notification for service status change Balazs Gibizer (Ericsson) Completed (Mitaka) CongressPush Type DataSource Driver Masahito Muroi (NTT) Completed (Mitaka) 13

14 Virtualized Infrastructure 4. (alt) Notify Development in OpenStack Monitor Notifier Manager Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Ceilometer Event Alarm Cinder Neutron Nova State Correction Vitrage Congres s Push Type Driver 14

15 Resource State Awareness 15

16 Ceilometer - Event Alarm sample Notification-driven alarm evaluator NEW Shortcut (notification-based) EXISTING (polling-based) Manager Audit Service stats notification event CinderNeutronNova

17 Nova – Force-Down and Exposing host_state Host / Machine Hypervisor VM nova compute nova api nova conductor nova scheduler nova DB queue External Monitoring Service vSwitch BMC EXISTING (periodic update) Force-down API [Mark Nova-Compute Down] Telling the nova service is no longer available service state Client Service disable API Evacuation API Reset Server State API Server API [Get Valid Server State] Server APIs to have host_state: GET /v2.1/ ​ {tenant_id} ​ /servers/detail GET /v2.1/ ​ {tenant_id} ​ /servers/ ​ {server_id} ​

18 Nova – Force-Down and Exposing host_state Host / Machine Hypervisor VM nova compute nova api nova conductor nova scheduler nova DB queue External Monitoring Service vSwitch BMC EXISTING (periodic update) Force-down API [Mark Nova-Compute Down] Telling the nova service is no longer available Client Service disable API Evacuation API Reset Server State API Server API [Get Valid Server State] Server APIs to have host_state: GET /v2.1/ ​ {tenant_id} ​ /servers/detail GET /v2.1/ ​ {tenant_id} ​ /servers/ ​ {server_id} ​ Allows Nova to integrate with External Monitoring Service, and makes sure Nova handles requests for the host service state

19 Nova – Force-Down and Exposing ‘host_state’ Host / Machine Hypervisor VM nova compute nova api nova conductor nova scheduler nova DB queue External Monitoring Service vSwitch BMC EXISTING (periodic update) Force-down API [Mark Nova-Compute Down] Telling the nova service is no longer available Client Service disable API Evacuation API Reset Server State API Server API [Get Valid Server State] Server APIs to have ‘host_state’: GET /v2.1/ ​ {tenant_id} ​ /servers/detail GET /v2.1/ ​ {tenant_id} ​ /servers/ ​ {server_id} ​ ‘host_state’ can have values: UP if nova-compute up UNKNOWN if nova-compute not reported by service group driver DOWN if nova-compute forced down MAINTENANCE if nova-compute is disabled Empty string indicates there is no host for server NOTE: This attribute appears in the response only if the policy permits. Default is for admin, but in NFV case also owner should be enabled. service state

20 Congress-based Inspection 20

21 What is Congress? Governance as a Service –Define and enforce policy for Cloud Services Policy –No single definitions Law/Regulations Business Rules Security Requirements Application Requirements –Any Policy any Service 21

22 Congress Architecture 22 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement

23 Requirements and gaps for Congress as Inspector RequirementsCongress FeaturesGaps Fast Failure NotificationPeriodical polling and policy enforcement Real-time policy enforcement Mapping of a physical failure to a logical failure Writing a rule for mappingNone AdaptabilityPolicy rules itselfNone 23

24 Congress PushType DataSource Driver 24 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement New data flow PushType DataSourceDriv er Another Service Enables services outside Congress to push data, and improves reaction time for policy enforcement

25 Congress Doctor Driver 25 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement New data flow Doctor DataSourceDriv er Monitor 1. 1.Monitor notifies HW failure event to Congress 2.Doctor Driver receives failure event, insert it to row list of Doctor Data 3.Policy Engine receives the failure event, then evaluate registered policy and enforce state correction 4.Policy Engine instruct Nova Driver to perform host service force down and reset state of VM(s) 2. 3.4.

26 Congress Doctor Driver (Detail) Driver Schema ( HW failure example ) +--------+-----------------------------------------------------+ | table | columns | +--------+-----------------------------------------------------+ | events | {'name': 'id', 'description': 'None'}, | | | {'name': 'time', 'description': 'None'}, | | | {'name': 'type', 'description': 'None'}, | | | {'name': 'hostname', 'description': 'None'}, | | | {'name': 'status', 'description': 'None'}, | | | {'name': 'monitor', 'description': 'None'}, | | | {'name': 'monitor_event_id', 'description': 'None'} | +--------+-----------------------------------------------------+ Row List of Doctor Data +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ | id | time | type | hostname | status | monitor | monitor_event_id | +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ | 0123-4567-89ab | 2016-03-09T07:39:27.230277464 | host.nic1.down | demo-compute0 | down | demo_monitor | 111 | +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ 26

27 Demo 27

28 Demo Scenarios 1.VM Down occurs due to NIC failure(s) –Scenario 1. Take single NIC down as a failure –Scenario 2. Take a set of redundant NICs down as a failure 2.Detect and notify the failure –Monitor detects the HW failure event and notifies to Congress –Update state of VMs effected by the failure as error in Nova –Ceilometer notifies the VM’s failure to VNFM immediately 3.Healing Process –VNFM promptly starts the healing process (switch active-standby) 28 Controller (Nova, Neutron, etc.) Compute1Compute2 Switch End User View ERR ACT SBY

29 Conclusions The Doctor project has realized NFVI failure detection, mapping failure to user feature, and fast notification to the user features already Congress realizes the Inspector module, opening up the Failure detection interface to the NFVI. 29


Download ppt "Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack."

Similar presentations


Ads by Google