High Availability Project Qiao Fu 2015.2.24. Project Progress Project details: – Weekly meeting: – Mailing list – Participants: Hui Deng

Slides:

Advertisements

Similar presentations

Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

Advertisements

1 Introducing the Specifications of the Metro Ethernet Forum.

ETSI NFV Management and Orchestration - An Overview

Ed Duguid with subject: MACE Cloud

Doctor Implementation Plan (Discussion) Feb. 6, 2015 Ryota Mibu, Tomi Juvonen, Gerald Kunzmann, Carlos Goncalves.

Chapter 19: Network Management Business Data Communications, 5e.

Goal – Verify that the infrastructure is able to handle the NFV application requirements Challenges – NFV applications are very different – Complex to.

Virtualization in HPC Minesh Joshi CSC 469 Dr. Box Feb 1, 2012.

Chapter 19: Network Management Business Data Communications, 4e.

Zhipeng (Howard) Huang

Keith Wiles DPACC vNF Overview and Proposed methods Keith Wiles – v0.5.

Please direct any questions or comments to

1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.

Resource Management for Dynamic Service Chain Adaptation

High Availability for OPNFV

24 February 2015 Ryota Mibu, NEC

storage service component

(OpenStack Ceilometer)

Escalator Project Proposal 20 April 2015 Jie Hu, ZTE.

IETF 91: Open Platform for NFV Collaboration with I2NSF Chris Donley 1.

1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC.

Kostas Giotis, Yiannos Kryftis, Vasilis Maglaris

1 Doctor Fault Management - Updates - 30 July 2015 Ryota Mibu, NEC.

Service Management Processes

Event Management & ITIL V3

COMS E Cloud Computing and Data Center Networking Sambit Sahu

1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.

Management Platforms. SMFA In TMN the intent is to define a realitivly common set of what can be supported by CMISE network elements and managed via a.

**DRAFT** Blueprints Alignment (OpenStack Ceilometer) 4 March 2015 Ryota Mibu, NEC.

Fault Localization (Pinpoint) Project Proposal for OPNFV

BoF: Open NFV Orchestration using Tacker

Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

1 ALCATEL-LUCENT — PROPRIETARY AND CONFIDENTIAL COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. NFV transforms the way service providers architect.

DPACC Management Aspects

Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

NFV Configuration Problem Statements Haibin Song Georgios Karagiannis

1 OPNFV Summit 2015 Doctor Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC.

HUAWEI TECHNOLOGIES CO., LTD. Huawei FusionSphere Key Messages – Virtualization, Cloud Data Center, and NFVI.

Escalator Questionnaire. Why a Questionnaire? To understand better the challenges of smooth upgrade in the OPNFV context –Part 1: Questions investigating.

Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.

Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.

Benoit Claise Mehmet Ersue

What is OPNFV? Frank Brockners, Cisco. June 20–23, 2016 | Berlin, Germany.

Powerpoint Templates Data Communication Muhammad Waseem Iqbal Lecture # 07 Spring-2016.

Failure Inspection in Doctor utilizing Vitrage and Congress

Distributed Computing Network Laboratory Reliability; Report on Models and Features for E2E Reliability ETSI GS REL 003 양현식.

When RINA Meets NFV Diego R. López Telefónica

ARC: Definitions and requirements for SO/APP-C/VF-C discussion including call notes Chris Donley July 5, 2017.

Chapter 19: Network Management

Fault Management with OpenStack Congress and Vitrage, Based on OPNFV Doctor Framework Barcelona 2016 Ryota Mibu NEC Ohad Shamir Nokia Masahito Muroi.

X V Consumer C1 Consumer C2 Consumer C3

Tina Tsou, Bryan Sullivan,

Doctor + OPenStack Congress

Ashiq Khan, NTT DOCOMO Ryota Mibu, NEC

ARC: Definitions and requirements for SO/APP-C/VF-C discussion Chris Donley Date , 2017.

Escalator: Refreshing Your OPNFV Environment With Less Troubles

OPNFV Doctor - How OPNFV project works -

Dovetail project update

Tomi Juvonen SW Architect, Nokia

ONAP Amsterdam Architecture

Documenting ONAP components (functional)

DPACC Management Aspects

Cloud computing mechanisms

**DRAFT** NOVA Blueprint 03/10/2015

GNFC Architecture and Interfaces

Latest Update on Gap Analysis of Openstack for DPACC

Latest Update on Gap Analysis of Openstack for DPACC

Presentation transcript:

High Availability Project Qiao Fu

Project Progress Project details: – Weekly meeting: – Mailing list – Participants: Hui Deng Jolliffe, Ian Maria Toeroe Qiao Fu Xue Yifei Yuan Yue … Project Page: – Project Outputs: – Requirement Document Overall Principle for High availability in NFV Hardware High Availability virtualization facilities High Availability (Host OS & Hypervisor) Virtual Infrastructure High Availability VIM High Availability VNF High Availability – Gap Analysis of Openstack HA scheme

Details Overall Principle for High availability in NFV – Service availability shall be considered with respect to the delivery of E2E services – There should be no single point of failure in the NFV framework – … Hardware High Availability – The hardware HA can be solved by several legacy HA schemes – Fault detection and report of HW failure from the hardware to VIM, VNFM and Orchestrator – Direct notice from the hardware to some specific VNFs should be possible. – Periodical update of hardware running conditions to the NFVI and VIM is required for further operation, which may include fault prediction, failure analyses, and etc.. – The hardware should provide redundancy architecture of the network plane

Details Virtual Facilities High Availability (Host OS & Hypervisor) – The NFVI should support VM migration, VM isolation, and VM restoration – Fault reporting capability. The NFVI is responsible for reporting/updating the VIM about HW and NFVI failurs. – The hypervisor should maintain and regularly update an information record - the "last words" - of each VM, which subsequently can be used for analysis and diagnostics

Details VNF HA – Service Availability Requirements: – We need to be able to define the E2E (V)NF chain based on which the E2E availability requirements can be decomposed into requirements applicable to individual VNFs and their interconnections – The interconnection of the VNFs should be logical and be maintained by the NFVI with guaranteed characteristics, e.g. in case of failure the connection should be restored within the acceptable tolerance time – These characteristics should be maintained in VM migration, failovers and switchover, scale in/out, etc.scenarios – It should be possible to prioritize the different network services and their VNFs.These priorities should be used when pre-emption policies are applied due to resource shortage for example. Service Availability Classification Levels – define different service availability levels – classify the virtual resources so that they can be matched to the VNF service availability levels – specify for each class of virtual resources the associated HA handling e.g. recovery mechanisms, recovery priorities, migration options so that the service availability class level can be guaranteed Metrics for Service Availability – Requirement: » It shall be possible to specify for each service availability class the associated availability metrics and their thresholds » It shall be possible to collect data for the defined metrics » It shall be possible to delegate the enforcement of some thresholds to the NFVI » Accordingly it shall be possible to request virtual resources with guaranteed characteristics, such as guaranteed latency between VMs (i.e. VNFCs), between a VM and storage, between VNFs – Failure Recovery Time » It should be irrelevant whether the abnormal event is due to a scheduled or unscheduled operation or it is caused by a fault. » The failure detection mechanisms should be available in the NFVI and configurable so that the target recovery times can be met » Abnormal events should be logged and communicated (i.e. notifications and alarms as appropriate) – Failure impact fraction » It should be possible to define the failure impact zone for all the elements of the system » At the detection of a failure of an element, its failure impact zone must be isolated before the associated recovery mechanism is triggered » If the isolation of the failure impact zone is unsuccessful the isolation should be attempted at the next higher level as soon as possible to prevent fault propagation. » It should be possible to define different levels of failure impact zones with associated isolation and alarm generation policies » It should be possible to limit the collocation of VMs to reduce the failure impact zone as well as to provide sufficient resources – Failure Frequency » There should be a probation period for each failure impact zones within which failures are correlated. » The threshold and the probation period for the failure impact zones should be configurable

Details VNF HA – Service Continuity The NFVI should maintain the number of VMs provided to the VNF in the face of failures. I.e. the failed VM instances should be replaced by new VM instances It should be possible to specify whether the NFVI or the VNF/VNFM handles the service recovery and continuity If the VNF/VNFM handles the service recovery it should be able to receive error reports in a timely manner. The VNF (i.e. between VNFCs) may have its own fault detection mechanism, which might be triggered prior to receiving the error report from the underlying NFVI therefore the NFVI/VIM should not attempt to preserve the state of a failing VM if not configured to do so The VNF/VNFM should be able to initiate the repair/reboot of resources of the VNFI (e.g. to recover from a fault persisting at the VNF level => failure impact zone escalation) It should be possible to disallow the live migration of VMs and when it is allowed it should be possible to specify the tolerated interruption time. It should be possible to restrict the simultaneous migration of VMs hosting a given VNF It should be possible to define under which circumstances the NFV-MANO in collaboration with the NFVI should provide error handling (e.g. VNF handles local recoveries while NFV-MANO handles geo-redundancy) The NFVI/VIM should provide virtual resource such as storage according to the needs of the VNF with the required guarantees (see virtual resouce classification). The VNF shall be able to define the information to be stored on its associated virtual storage It should be possible to define HA requirements for the storage, it’s availability, accessibility, resilience options, i.e. the NFVI shall handle the failover for the storage. The NFVI shall handle the network/connectivity failures transparent to the VNFs The VNFs with different requirements should be able to coexist in the NFV Framework The VNF should define whether scale in/out is triggered by the VNF (VNFM) or the NFVI If the scale in/out is triggered by the NFVI it should be possible to define the metrics to monitor and the related thresholds that triggers the scale in/out operation

Open Questions Requirement doc – General template (documentation project) Cooperation with other projects – High Availability: HA schemas for HW, Platform and VNF, including redundancy, restoration, migration requirements. – Output requirements for fault detection and management to Doctor

Discussion about High Availability, Doctor and Failure Prediction Reaction time for the three projects is different. – HA requires reaction time less than seconds, within which time fault must be detected and reported, and failover should be taken place to avoid failure of services. – Doctor provides online fault detection and management. HA may output requirements for fault detection based on requirements of services and platforms. – Failure prediction provides offline failure analysis and prediction