Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Availability for OPNFV

Similar presentations


Presentation on theme: "High Availability for OPNFV"— Presentation transcript:

1 High Availability for OPNFV
May. 2015

2 Agenda Project introduction Scenarios discussion Requirement doc
Gap analysis Blue prints Open questions: Storage HA

3 Project Introduction Fu Qiao

4 Project Detail Project Page:
Weekly meeting: Wednesday at 13:00pm-14:00pm UTC Mailing list Opnfv-tech-discussion [availability] Participants: Hui Deng Jolliffe, Ian Maria Toeroe Qiao Fu Xue Yifei Yuan Yue Yao Cheng LIANG Sean Winn Joe Huang Georg Kunz Basavaprabhu Badami ….

5 Project progress The ongoing work of 1st release of OPNFV has included some HA schemas, e.g. openstack HA, active/active or active/passive state of Rabbit MQ and Mysql, which is described in requirement doc. section 5. In this project, we further discuss the scenarios, framework, and detail requirements and API definition of HA in OPNFV platform. Project Outputs: Service HA scenario analysis Requirement Document Gap Analysis of Openstack HA scheme Blue Prints: HA API description

6 Scenarios Discussion Fu Qiao

7 Service Availability Levels for Carrier Grade VNFs
Recovery Time Customer Type Recommendations SAL 1 e.g. 5 – 6 seconds Network Operator Control Traffic Government/Regulatory Emergency Services Redundant resources to be made available on-site to ensure fast recovery. SAL 2 e.g. 10 – 15 seconds Enterprise and/or large scale Customers Network Operators service traffic Redundant resources to be available as a mix of on-site and off-site as appropriate. On-site resources to be utilized for recovery of real-time services. Off-site resources to be utilized for recovery of data services. SAL 3 e.g. 20 – 25 seconds General Consumer Public and ISP Traffic Redundant resources to be mostly available off-site. Real-time services should be recovered before data services Source: ETSI GS NFV-REL 001 V1.1.1

8 Scenarios State Redundancy in VNF Failure detection Use Case VNF
Stateful yes VNF only UC1 VNF & NFVI UC2 no UC3 UC4 Stateless UC5 UC6 UC7 UC8 UC9: Repeated failure in VNF

9 UC1: Stateful VNF with Redundancy
UC1: Stateful VNF with Redundancy Failure detection: VNF only NFVO Recovery time 1. VNFC fails 2. NF fails* 3. VNF detects the failure VNF’s Services NF 4. VNF isolates VNFC VNF VNFM 5. VNF fails over 6. NF recovers STB ACT ACT STB 7. VNF repairs VNFC NFVI’s Services VM VM VM VM NFVI VIM Nothing new in this scenario *Steps 1&2 are simultaneous they are separated for clarity

10 UC2: Stateful VNF with Redundancy
UC2: Stateful VNF with Redundancy Failure detection: VNF & NFVI NFVO Recovery time 1. VM fails 2. VM Service fails 3. VNFC fails VNF’s Services NF 4. NF fails* VNF VNFM 5a. VNF detects the failure 6a. VNF fails over STB ACT ACT STB 7a. NF recovers 5b. NFVI detects the failure NFVI’s Services 6b. NFVI reports to VIM VM VM VM VM VM NFVI VIM 7b. VIM reports to VNFM 8. VNFM ok to VIM 9. VIM repairs VM 10. VM Service recovers 11. VNF repairs VNFC *Steps 1-4 are simultaneous they are separated for clarity

11 UC3: Stateful VNF with No Redundancy
UC3: Stateful VNF with No Redundancy Failure detection: VNF only NFVO VNFC checkpoints its state to VD, which is HA Recovery time 1. VNFC fails 2. NF fails* VNF’s Services 3. VNF detects the failure NF VNF VNFM 4. VNF isolates VNFC 5. VNF repairs VNFC ACT ACT 6. VNFC gets state 7. NF recovers state state state NFVI’s Services VM VD VM VM NFVI VIM *Steps 1&2 are simultaneous they are separated for clarity

12 UC4: Stateful VNF with No Redundancy
UC4: Stateful VNF with No Redundancy Failure detection: VNF & NFVI NFVO VNFC checkpoints its state to VD, which is HA 1. VM fails Recovery time 2. VM Service fails 3. VNFC fails VNF’s Services NF 4. NF fails* VNF VNFM 5a. VNF detects the failure 6a. VNF reports to VNFM ACT ACT 5b. NFVI detects the failure 6b. NFVI reports to VIM state state state 7. VIM reports to VNFM NFVI’s Services 8. VNFM ok to VIM VM VM VD VM VM NFVI VIM 9. VIM repairs VM 10. VM Service recovers 11. VIM informs VNFM 12. VNFM repairs VNFC 13. VNFC gets state 14. NF recovers *Steps 1-4 are simultaneous they are separated for clarity

13 High Availability Flow Chart
Service failure happens(may be caused of failure of VNF or NFVI) Step 1-Service Recovery: (Time Constraint, Carrier Grade VNF should be recovered within seconds) recovery time Failure detection (by service heartbeat loss/NFVI report of failure) Service is unavailable Service failover VNF failure only NFVI failure VNFC repair/restart VM recovery and VNFC recovery repeated failure Step 2-NFVI recovery or repair: (less Time Constraint)

14 Requirement doc & Gap Analysis doc.
Ian Jolliffe

15 Requirement Doc. Details
Framework The ultimate goal is to provide upper layer service high availability Service high availability is provided by recovery of service (including service restart and failover) within seconds following the SAL. Repair or recovery of the failed layer should happen afterwards. Ensure that no failure in one layer causes a cascading failure at other layers.  A single layer can detect failures in other layers and help recover failed components.           Service layer  Service Application/VM layer  VNF/VNFC VNFM NFVI/VIM layer  NFVI VIM Hardware layer  Hardware

16 Requirement doc. outline
1  Overall Principle for High Availability in NFV 1.1 Framework for High Abailability in NFV 1.2 Definitons 1.3 Overall requirements 1.4 Time requirement  2     Hardware HA 3  Virtualization Facilities (Host OS, Hypervisor) 4 Virtual Infrastructure HA – Requirements: 4.1 Virtual Compute 4.2 Virtual Storage 4.3 Virtual Network 5     VIM High availability 5.1 Archeticture requirement of VIM HA 5.2 Fault detection and alarm requirement of VIM 5.3 HA mechanism of VIM provided for NFV 5.4 SDN controller 6     VNF HA 6.1     Service Availability 6.2     Service Continuity 7 Storage

17 Gap Analysis 14+ HA related gaps have been discovered
Nova: 6 gaps in Nova covering scheduler, consoleauth and health status of compute node. Neutron: 2 gaps in Neutron covering L3 agent and DHCP agent. Cinder: 2 gaps in Cinder covering HA configuration and multi-attachment. VIM NBI: 1 gap for error reporting QoS: 1 gap for QoS management References:

18 Blue Prints Joe Huang

19 Escape from site level KeyStone failure
Only one KeyStone server can be configured for token validation or revoke list Validate Token (Fernet,UUID) or retrieve RevokeList (PKI) API Request (Nova/Cinder/Neutron…)  KeyStone Middleware  OpenStack service (Nova/Cinder/Neutron…) Allow secondary KeyStone server configured in case of site level KeyStone failure. (Cons. of DNS based load balance : delayed failover for caching issues, an unpredictable routing) Site1 KeyStone Site2 KeyStone Validate Token (Fernet,UUID) or retrieve RevokeList (PKI) API Request (Nova/Cinder/Neutron…)  KeyStone Middleware  OpenStack service (Nova/Cinder/Neutron…)

20 Open Questions: Storage HA
Georg Kunz 

21 storage service component
Storage Architecture file, (object) VNF/VNFC storage service component NFVI block, file, object block, file, object VIM distributed storage Hardware host host storage array Ctrl1 Crtl2 switch switch

22 Storage HA – Network Failure
block, file, object block, file, object VIM distributed storage host host storage array Ctrl1 Crtl2 switch switch Storage network link fails Storage network detects failure Storage network switches to standby link(s) iSCSI multi-pathing bonding Report failure to O&M

23 Storage HA – Failure in Storage Array
block, file, object block, file, object VIM distributed storage host host storage array Ctrl1 Crtl2 switch switch Component within storage array fails Array-internal fail-over kicks in RAID Redundant controllers, NICs, … Report failure to O&M

24 Storage HA – Host Failure
block, file, object block, file, object VIM distributed storage host host storage array Ctrl1 Crtl2 switch switch Storage host fails Distributed storage layer detects failure Distributed storage layer rebalances data

25 Non-HA Block Storage (legacy)
Mirroring of block devices on VNF level VNF VNFC (active) VNFC (passive) mirroring NFVI

26 HA Block Storage Active/passive configuration
Failover supervised by clustering software in VNF Requires multi-attach capability of Cinder VNF VNFM VNFC (active) VNFC (standby) VNFC (active) NFVI VIM

27 HA Block Storage Active/active configuration
Clustered file system enables concurrent access Requires multi-attach capability of Cinder VNF VNFM VNFC (active) VNFC (active) NFVI VIM

28 VNF level HA for Multiple Backends
Block devices provided by multiple backends Mirroring of block devices on VNF level Pro-active failover possible NFVI VNF VNFC (active) (passive) mirroring VNFM VIM backend 1 backend 2

29 Open Questions Can NFVI storage system provide sufficient level of HA to meet SAL levels? Failover/recovery times heavily depend on deployed solution How much does rebuild of data impact performance?

30 File Storage Legacy deployments NFVI
File storage service provided by VNFC Layered on top of block storage services NFVI File storage service provided by NFVI / hardware Openstack Manila

31 Ephemeral Storage Ephemeral Storage
Main use: File systems of VMs booted from image Location On local disks of compute host Isolation of failover domains VM unaffected by failure of storage system Disk failure corresponds to host failure Limits live migration capabilities On distributed or external storage Correlated failures possible Failure of storage backend impacts VMs Properties of respective storage backend apply

32 Appendix

33 UC3: Statefull VNF with No Redundancy
UC3: Statefull VNF with No Redundancy Failure detection: VNF only NFVO VNFC checkpoints its state to VD, which is HA Recovery time 1. VNFC fails 2. NF fails* VNF’s Services 3. VNF detects the failure NF VNF VNFM 4. VNF isolates VNFC 5. VNF repairs VNFC ACT ACT 6. VNFC gets state 7. NF recovers state state state NFVI’s Services VM VD VM VM NFVI VIM *Steps 1&2 are simultaneous they are separated for clarity

34 UC3-b: Statefull VNF with No Redundancy
UC3-b: Statefull VNF with No Redundancy Failure detection: VNF only NFVO VNFC checkpoints its state to VD, which is HA Recovery time 1. VNFC fails 2. NF fails* VNF’s Services 3. VNF detects the failure NF VNF VNFM 4. VNF reports to VNFM 5. VNFM isolates VNFC ACT ACT 6. VNFM repairs VNFC 7. VNFC gets state state state state 8. NF recovers NFVI’s Services VM VD VM VM NFVI VIM *Steps 1&2 are simultaneous they are separated for clarity

35 UC4: Statefull VNF with No Redundancy
UC4: Statefull VNF with No Redundancy Failure detection: VNF & NFVI NFVO VNFC checkpoints its state to VD, which is HA 1. VM fails Recovery time 2. VM Service fails 3. VNFC fails VNF’s Services NF 4. NF fails* VNF VNFM 5a. VNF detects the failure 6a. VNF reports to VNFM ACT ACT 5b. NFVI detects the failure 6b. NFVI reports to VIM state state state 7. VIM reports to VNFM NFVI’s Services 8. VNFM ok to VIM VM VM VD VM VM NFVI VIM 9. VIM repairs VM 10. VM Service recovers 11. VIM informs VNFM 12. VNFM repairs VNFC 13. VNFC gets state 14. NF recovers *Steps 1-4 are simultaneous they are separated for clarity

36 UC5: Stateless VNF with Redundancy
UC5: Stateless VNF with Redundancy Failure detection: VNF only NFVO Spare VNFC may or may not be instantiated 1. VNFC fails Recovery time VNF’s Services 2. NF fails* NF VNF VNFM 3. VNF detects the failure 4. VNF isolates VNFC Spare ACT Spare ACT 5. VNF fails over 6. NF recovers 7. VNF restores redundancy NFVI’s Services VM VM VM VM NFVI VIM Nothing new in this scenario *Steps 1&2 are simultaneous they are separated for clarity

37 UC6: Stateless VNF with Redundancy
UC6: Stateless VNF with Redundancy Failure detection: VNF & NFVI NFVO Spare VNFC may or may not be instantiated Recovery time 1. VM fails 2. VM Service fails 3. VNFC fails VNF’s Services NF 4. NF fails* VNF VNFM 5a. VNF detects the failure 6a. VNF fails over Spare ACT ACT Spare 7a. NF recovers 5b. NFVI detects the failure NFVI’s Services 6b. NFVI reports to VIM VM VM VM VM VM NFVI VIM 7b. VIM reports to VNFM 8. VNFM ok to VIM 9. VIM repairs VM 10. VM Service recovers 11. VNF restores redundancy *Steps 1-4 are simultaneous they are separated for clarity

38 UC7: Stateless VNF with No Redundancy
UC7: Stateless VNF with No Redundancy Failure detection: VNF only NFVO Recovery time 1. VNFC fails 2. NF fails* VNF’s Services 3. VNF detects the failure NF VNF VNFM 4. VNF reports to VNFM 5. VNF isolates VNFC ACT ACT 6. VNF repairs VNFC 7. NF recovers NFVI’s Services VM VD VM VM NFVI VIM *Steps 1&2 are simultaneous they are separated for clarity

39 UC8: Stateless VNF with No Redundancy
UC8: Stateless VNF with No Redundancy Failure detection: VNF & NFVI NFVO Recovery time 1. VM fails 2. VM Service fails 3. VNFC fails VNF’s Services NF 4. NF fails* VNF VNFM 5a. VNF detects the failure 6a. VNF reports to VNFM ACT ACT 5b. NFVI detects the failure 6b. NFVI reports to VIM 7. VIM reports to VNFM NFVI’s Services VM VM VD VM VM 8. VNFM ok to VIM NFVI VIM 9. VIM repairs VM 10. VM Service recovers 11. VIM informs VNFM 12. VNF repairs VNFC 13. NF recovers *Steps 1-4 are simultaneous they are separated for clarity

40 UC9: Stateless VNF with No Redundancy
UC9: Stateless VNF with No Redundancy Failure detection: VNF only – BUT Repeatedly NFVO 1. VNFC fails 2. NF fails 3. VNF detects the failure and counts UC7 VNF’s Services NF 4. VNF isolates VNFC VNF VNFM 5. VNF repairs VNFC 6. NF recovers ACT ACT ACT ACT 1 4 2 3 …. VNFC fails….2 …. VNFC fails….3 …. VNFC fails….4 NFVI’s Services VM VD VM VM NFVI VIM Fault is not in the VNFC!

41 UC9: Stateless VNF with No Redundancy
UC9: Stateless VNF with No Redundancy Failure detection: VNF only – BUT Repeatedly NFVO 1. VNFC fails 2. NF fails 3. VNF detects the failure and counts VNF’s Services NF 4. VNF isolates VNFC VNF VNFM 5. VNF repairs VNFC 6. NF recovers ACT 4 …. VNFC fails….2 …. VNFC fails….3 …. VNFC fails….4 NFVI’s Services N. VNF reports to VNFM VM VM VD VM VM NFVI VIM N+1. VNFM reports to VIM N+2. VIM isolates VM N+3. VIM repairs VM N+4. VM Service recovers N+5. VNF repairs VNFC N+6. NF recovers

42 Scenario chart Scenario 1,2,5,6 Add all the scenarios as appendix NFVI provide HA API to VNF? Opensaf is a PaaS, as a HA middleware actually VNF stateful and stateless may require different schema in the NFVI, if VNF is not redundancy, we may need VM redundancy. At this case VNF problem may not be solved.


Download ppt "High Availability for OPNFV"

Similar presentations


Ads by Google