Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 1 An introduction to NETWORK RESILIENCY Giorgio Ventre & Stefano.

Similar presentations


Presentation on theme: "Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 1 An introduction to NETWORK RESILIENCY Giorgio Ventre & Stefano."— Presentation transcript:

1 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 1 An introduction to NETWORK RESILIENCY Giorgio Ventre & Stefano Avallone COMICS Group Dipartimento di Informatica e Sistemistica Università di Napoli Federico II

2 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 2 References  Jean-Philippe Vasseur, Mario Pickavet, Piet Demeester. “Network Recovery, protection and restoration of optical, SONET-SDH, IP and MPLS”. Morgan Kaufmann  AA. VV. Building Survivable Networks, Feature Issue of IEEE Network Magazine, March/April 2004

3 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 3 Communication Networks Relevance  Communication Networks are becoming fundamental infrastructures: the amount of data carried out by Communication Networks is considerably grows in the last years; many social and economic activities depend on Communication Networks; many safe critical activities depend on Communication Networks.  Reliability is an essential feature of today Communication Networks !

4 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 4 Network Reliability: definition [1]  The (a) ability of a network to maintain or restore an acceptable level of performance during network failures by applying various restoration techniques, and (b) mitigation or prevention of service outages from network failures by applying preventive techniques.  Acronym: Network Survivability. [1] Alliance for Telecommunications Industry Solutions (ATIS) http://www.atis.org/tg2k/_network_reliability.html

5 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 5 Network Reliability: related concepts  There are many concepts that are related to Network Reliability, for example: network element reliability: the probability of a network element to be fully operational during a certain period of time; network element availability: the probability of a network element to be in an up-state at a given instant of time t; network element fault: the inability of a network element to perform a required action....

6 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 6 Which failures may occur ?  The ability of a network to provide required services may be compromised by different failures: planed or unplanned failures; internal or external failures; software or hardware failures; malicious or casual failures....

7 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 7 Accounted Failures  Provide actions to address all the failures that may occur on a Communication Network is unfeasible.  Network provider and ISP normally provides actions plain to address the most frequent failures.  These failure are called Accounted Failure  The most common type of Accounted Failure are: single link failure; single node failure.

8 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 8 Failures' Impact  In today Communication Networks a single failure may produces a major disruption in network availability.  A single cut in an optical cable may drop thousands of logical network connections. On July 5, 2002 a submarine cable break affected the Asia Pacific Cable Network (ACPN 2), causing a considerable slowdown in all the network connections among Japan, China, South Korea, etc.

9 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 9 Failures' Impact: ATC systems  Press Releases (http://www.natca.org/mediacenter/press-release- detail.aspx?id=394)http://www.natca.org/mediacenter/press-release- detail.aspx?id=394  MASSIVE POWER, COMMUNICATIONS FAILURE AT MAJOR AIR TRAFFIC CONTROL CENTER PUTS CONTROLLERS IN DARK, FLIGHTS IN JEOPARDY  07/19/2006 Bob Marks PALMDALE, Calif. – A massive power and communications failure late Tuesday at the Los Angeles Air Route Traffic Control Center left scrambling air traffic controllers to deal with a nightmare scenario – how to keep dozens of flights away from each other above a large swath of the Southwestern United States despite the inability to see them, talk to them or relay crucial instructions for 15 excruciatingly long minutes.  Every ounce of skill, heart and determination that controllers bring into the control room every day was put to the test during one of the worst outages to ever hit the facility. It was so bad, controllers say, that the only thing they had of use to aid the situation that actually worked was their cell phones – devices which the Federal Aviation Administration, inexplicably, has barred from control rooms, further impeding the safety of the system.  More details in http://themainbang.typepad.com/blog/2006/07/complete_failur.html

10 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 10 Network Reliability Parameters  Some parameters that may be used to characterize the reliability of a network may be found in ITU G.911 Recommendation: “Parameters and Calculation Methodologies for Reliability and Availability of Fibre Optic Systems”  In the following slides some of the parameters defined in ITU G.911 are introduced

11 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 11 Failure in Time (FITs) and Maintenance Time  Failure in Time: is the number of device's failure occurred in a specific time interval; normally is expressed as failures per bilion of device hours.  Maintenance Time: the time interval during which a maintenance action is performed on an item either manually or automatically,...

12 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 12 Mean Time Between Failure (MTBF)  The Mean Time Between Failures (MTBF) is the steady-state expectation of time between failures  Mathematically the MTBF (in years per failure) is releated to the failure rate F (in FITs per 10 9 hours) as follows:

13 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 13 Mean Time To Repair (MTTR)  The Mean Time To Repair (MTTR) is defined as total corrective maintenance time divided by the total number of corrective maintenance actions during a period of time.  Given the definitions of MTBF and MTTR the availability A of an item may be derived as:

14 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 14 Users, services and reliability requirements  Network reliability is a “relative concept”.  The reliability requirements of a communication network depend on: the user type; the service type.  Different users-services combinations led to divers requirements in terms of MTBF and MTTR.

15 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 15 User classification  According to their reliability requirements, network users may be classified in the following categories: Safety critical users. Users for which service interruption are unacceptable. Business critical users. Users for which any service interruption bring to a high financial loss. Low cost users. Users for which service interruption cause only discomfort. Basic lever users. Users for which service reliability is only a side effect.

16 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 16 Availability: Impact of Outages Ref: “Service Applications for SONET DCS Distribution Restoration”, IEEE J. Special Areas in Comm, Jan 94 50 ms 200 ms 2 Sec 10 Sec5 Min30 Min15 Min Protection Switching Range 1 st Restoration Target Range 2nd Restoration Target Range 3rd Restoration Target Range 4 th Restoration Target Range Restoration time after failure detection Service Outage Impact 0 Service “Hit”” (Reframes) Undesirable Social / Business Impact Unacceptable Potential voiceband discinnects (<5%) Trigger changeover of CSS7 STP signaling links Effect cell rerouting process May drop voice band calls depending on channel bank vintage Drop all circuit switched connections PL disconnects Potential packet (X.25) disconnects Potential data session time-outs Packet (X.25) disconnects Data session time-outs Network congestion Minor social/ Business impacts Potentially FCC reportable Major social/ business impacts

17 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 17 Market Drivers for Survivability n Customer Relations n Competitive Advantage n Revenue Negative - Tariff Rebates Positive - Premium Services Business Customers Medical Institutions Government Agencies n Impact on Operations n Minimize Liability

18 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 18 Network Survivability  Availability: 99.999% (5 nines) => less than 5 min downtime per year  Since a network is made up of several components, the ONLY way to reach 5-nines is to add survivability in the face of failures… Survivability = continued services in the presence of failures Protection switching or restoration: mechanisms used to ensure survivability Add redundant capacity, detect faults and automatically re-route traffic around the failure  Restoration: related term, but slower time-scale  Protection: fast time-scale: 10s-100s of ms… implemented in a distributed manner to ensure fast restoration

19 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 19 Failure Types & Other Motivations  Types of failure: Components: links, nodes, channels in WDM, active components, software… Human error: backhoe fiber cut Fiber inside oil/gas pipelines less likely to be cut Systems: Entire COs can fail due to catastrophic events  Protection allows easy maintenance and upgrades : Eg: switchover traffic when servicing a link…  Single failure vs multiple concurrent failures… Goal: mean repair time << mean time between failures…  Protection also depends upon kind of application.  Survivability may hence be provided at several layers

20 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 20 Network Survivability Architectures Mesh Restoration Architectures Linear Protection Architectures Ring Protection Architectures

21 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 21 Network Availability & Survivability Availability is the probability that an item will be able to perform its designed functions at the stated performance level, within the stated conditions and in the stated environment when called upon to do so. Reliability Reliability + Recovery Availability =

22 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 22 Quantification of Availability Percent Availability N-Nines Downtime Time Minutes/Year 99% 2-Nines5,000 Min/Yr 99.9% 3-Nines500 Min/Yr 99.99% 4-Nines50 Min/Yr 99.999% 5-Nines5 Min/Yr 99.9999% 6-Nines.5 Min/Yr

23 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 23 PSTN  Individual elements have an availability of 99.99%  One cut off call in 8000 calls (3 min for average call). Five ineffective calls in every 10,000 calls. Facility Entrance AN 0.01 % 0.005 % 0.02 % 0.005 % LELE NINI LELE NINI LDLD AN 0.01 % PSTN End-2-End Availability 99.94% NI : Network Interface LE : Local Exchange LD : Long Distance AN : Access Network Source : http://www.packetcable.com/downloads/specs/pkt-tr-voipar-v01-001128.pdf

24 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 24 IP Network Expectations ServiceDelayJitterLossAvailability Real Time Interactive (VOIP, Cell Relay..) LLLH Layer 2 & Layer 3 VPN’s (FR/Ethernet/AAL5) M Internet Service HHML Video ServicesLMMH H L L L : Low M : Medium H : High

25 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 25 Measuring Availability: The Port Method Based on Port count in Network Does not take into account the Bandwidth of ports e.g. OC-192 and 64k are both ports Good for dedicated Access service because ports are tied to customers. (Total # of Ports X Sample Period) - (number of impacted port x outage duration) (Total number of Ports x sample period) x 100

26 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 26 The Port Method Example 10,000 active access ports Network An Access Router with 100 access ports fails for 30 minutes. –Total Available Port-Hours = 10,000*24 = 240,000 –Total Down Port-Hours = 100*.5 = 50 –Availability for a Single Day = (240000-50)/240,000*100 = 99.979166 %

27 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 27 The Bandwidth Method Based on Amount of Bandwidth available in Network Takes into account the Bandwidth of ports Good for Core Routers (Total amount of BW X Sample Period) - (Amount of BE impacted x outage duration) (Total amount of BW in network x sample period) x 100

28 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 28 The Bandwidth Method Example Total capacity of network 100 Gigabits/sec An Access Router with 1 Gigabits/sec BW fails for 30 minutes. –Total BW available in network for a day = 100*24 = 2400 Total BW lost in outage = 1*.5 = 0.5 –Availability for a Single Day = ((2400-0.5)/2,400)*100 = 99.979166 %

29 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 29 Basic Ideas: Working and Protect Fibers

30 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 30 Service classification (1/2)  Communication networks are used to carry many different services.  Different services may have divers reliability requirements.  Reliability requirements of such services are related to QoS parameters: Bit Rate; Delay; Jitter;...

31 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 31 Service classification (2/2) [2] A.Lason, et al., “Network Scenarios and Requirements”, European IST project Layers Internetworking in Optical Network (LION), deliverable D6, Septemper 1999.

32 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 32 How to increase network reliability ?  Prevent network failure: put network cables deeper in the ground; more testing for hardware and software;.....  Duplicate vulnerable network elements: dual homing.  Independently from these measures, network failures still occur.  There is need for network recovery or resilience schemes !

33 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 33 Network recovery basic idea  Build networks to have alternate paths  Design systems to have alternate entities  Monitor for possible falures  Manage networks proactively

34 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 34 Network recovery requirements  Network recovery imposes several requirements. For example: there should be backup capacity to create a recovery path; the backup capacity must be enough to ensure QoS constraints; single point of failure must be avoided;.....

35 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 35 Recovery and reversion cycles Recovery Cycle Reversion Cycle

36 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 36 Recovery mechanisms  A high variety of recovery mechanisms exist.  Every mechanisms has advantages and drawbacks  In the following slides some criteria that may be used to evaluate and classify recovery mechanisms are reported [3, 4]. [3] V. Sharma et al., “Framework for MPLS-based recovery”, RFC 3469, IETF web site, Feb 2003 [4] K. Owens, V. Sharma, M. Oommen, and F. Hellstrand, “Network Survivability Considerations for Traffic Engineered IP Networks”, Internet draft: draft-owens-te-network-survivability-03, May 2002. Available at: www.ietf.org. Accessed July 2005 www.ietf.org

37 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 37 Backup Capacity  Dedicated one to one relationship between the backup resources and the working path; the simplest solution; an inefficient solution.  Shared the backup resources are shared among different working path; a more simple solution; a more efficient solution.

38 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 38 Recovery Path  Preplanned recovery paths for all accounted failure scenario is calculated in advance; allows fast recovery of failure; lacks flexibility for unaccounted failure scenarios.  Dynamic the recover path is calculate “on the fly” when the failure is detected; may be used to search recovery paths also for unaccounted failure scenarios.

39 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 39 Recovery Approaches  Protection the recovery paths are preplanned and fully signaled before a failure occurs; when a failure occurs no additional signaling is needed to establish the recovery path; is the faster solution.  Restoration the recovery pat may be preplanned or dynamically allocated but are not signaled in advance; when a failure occurs aditional signaling is needed to establish the recovery path; is a more flexible solution.

40 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 40 Protection Variants (1/2)  1+1 Protection (Dedicated Protection) there is exactly one dedicated recovery path for each working segment; the traffic is permanently duplicated on both the working path and the recovery path; is a quite expensive solution.  1:1 Protection (Dedicated Protection with extra traffic) there is exactly one dedicated recovery path for each working segment; the traffic is transmitted over only a path at a time; it is possible to transport extra traffic along the recovery path in failure free condition.

41 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 41 Protection Variants (2/2)  1:N (Shared Recovery With Extra Traffic ) each recovery entity is used to protect N working entities; it is possible use the recovery entities to transport extra traffic in failure free conditions.  M:N (M ≤ N) a set of M recovery entities are used to protect a set of N working entities; it is possible use the recovery entities to transport extra traffic in failure free conditions.

42 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 42 Recovery Extent (1/2)  Local Recovery in failure condition only the affected network element are bypassed using the recovery path; the RHE and RTE are closer to the failure, so they may detect the failure quickly, leading to a smaller recovery time. in case of failure the route followed by the traffic may be not optimal (e.g the same traffic may cross a link twice !). In case of two successive nodes failure will fail

43 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 43 Recovery Extent (2/2)  Global Recovery in failure condition the complete working path between source and destination is bypassed; the recovery time is greater that that of the local recovery an optimal recovery path is used in case of failure; In case of two successive nodes failure could still resolve the problem; may generate more “state overhead” that the local approach.  An intermediate solution between Local and Global approach may be adopted !!

44 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 44 Control of Recovery Mechanisms (1/2)  Centralized a central controller determines the action to take in case of failure; the central controller also determine when and where a fault ha occurred; the central controller is a single point of failure. is generally an efficient approach; in principle is a simpler approach, but the central controller may become a very complex system;

45 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 45 Control of Recovery Mechanisms (2/2)  Distributed there is not a centralized controller, all the network elements are capable to autonomously react to failure; with this approach there is not a global view of the network condition; the network elements may have to exchange information to keep a consistent view of the network; is a more scalable approach.

46 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 46  Two or more nodes connected to each other with a ring of links Protection Topologies - Ring E W W E W EW E D L L WorkingProtect

47 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 47 Protection Topologies - Mesh  Three or more nodes connected to each other Can be sparse or complete meshes Spans may be individually protected with linear protection Overall edge-to-edge connectivity is protected through multiple paths Working Protect

48 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 48 Protection Switching Terminology  1+1 architectures - permanent bridge at the source - select at sink  m:n architectures - m entities provide protection for n working entities where m is less than or equal to n allows unprotected extra traffic most common - SONET linear 1:1 and 1:n

49 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 49 1+1 vs 1:n Working Protect Working Protect (1+1)(1:n)

50 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 50 SONET Linear 1+1 APS BRSW TX RX SW RX BR TX Working Protection Working Protection TX = Transmitter RX = Receiver BR = Bridge SW = Switch

51 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 51 SONET 1:1 Linear APS BRSW TXRX SW RX BR TX RX APS Channel TX = Transmitter RX = Receiver BR = Bridge SW = Switch Protection Working Protection

52 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 52 Protection Switching: Terminology  Dedicated vs Shared: working connection assigned dedicated or shared protection bandwidth 1+1 is dedicated, 1:n is shared  Revertive vs Non-revertive: after failure is fixed, traffic is automatically or manually switched back Shared protection schemes are usually revertive  Uni-directional or bi-directional protection: Uni: each direction of traffic is handled independent of the other. Fiber cut => only one direction switched over to protection. Usually done with dedicated protection; no signaling required. Bi-directional transmission on fiber (full duplex) => requires bi-directional switching & signaling required

53 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 53 Mesh Restoration DCS Line or Link Restoration Working Path Path Restoration Control: Centralized or Distributed Route Calculation: Preplanned or Dynamic Type of Alternate Routing: Line or Path

54 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 54 Link vs. Path restoration  Link restoration Requires the ability to identify the failed link at both ends. Can not protect node failureCan not protect node failure. Link based Mesh (generalized loop back) – insensitive to additions to network – scalable; backup path can be pre-computed – fast recovery; dynamic rerouting  Path restoration More resilient than link restoration More resilient than link restoration. Reroutes the traffic from the primary path to a Shared Risk Group (SRG) - disjoint backup path. Protect both end-to-end paths and single links Protect both end-to-end paths and single links. Preferred: Path BasedPreferred: Path Based

55 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 55 Link vs. Path restoration A B C D E F Flow 1: A-C-D Flow 2: E-C-D-F A B C D E F A B C D E F Link (Generalized Loopback) Restoration Path Restoration Fault: Link Cut

56 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 56 Pre-compute vs. Real-time  Pre-computed calculates restoration paths before a failure happens. Allows prior availability of reroute information to the nodes where actions need to be taken after failure is detected. Enables fast restoration Enables fast restoration.  Real-time calculates restoration paths after a failure happens. Restoration is slower. Restoration is slower. Enables more efficient capacity utilization Enables more efficient capacity utilization. Preferred: Pre-computedPreferred: Pre-computed

57 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 57 Centralized vs. Distributed  Centralized restoration: Computes restoration and primary paths for all demands with up-to-date information Routes may then be downloaded into nodal databases. Effectiveness? More capacity efficiencyMore capacity efficiency Possibly slow (but may be executed in the background)Possibly slow (but may be executed in the background) Scalability in questionScalability in question.  Distributed restoration Source and destination nodes dynamically search for the protection wavelengths required to reestablish the disrupted lightpath Since lack of knowledge of sharing database of other OXCs, it may not be able to determine backup sharability for any given primary path Preferred:Preferred: Central path determinationCentral path determination Distributed RestorationDistributed Restoration

58 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 58 Protection Topologies - Linear  Two nodes connected to each other with two or more sets of links Working Protect Working Protect (1+1)(1:n)

59 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 59 Mesh Restoration vs Ring/Linear Protection Extracted from: T-H. Wu, Emerging Technologies for Fiber Network Survivability, See References

60 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 60 IP layer restoration  IP Layer Restoration (real-time) Achieved by exchanging control messages between adjacent routers Re-determine the affected route Update routing tables Propagate changes (OSPF, BGP-4) Capable of recovery from multiple faults Capable of recovery from multiple faults Slow (10s of seconds to minutes – Fumagalli) Slow (10s of seconds to minutes – Fumagalli) requires online processing upon failure Fault discovery: Explicitly: ICMP messaging Implicitly: Expiring of timers Guarantees networkwide survivability Guarantees networkwide survivability Independent of underlying physical network Independent of underlying physical network Physical Data Link Network (IP) Transport Session Presentation Application

61 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 61 MPLS layer restoration  MPLS Layer Protection Real-time or pre-computed Real-time or pre-computed Line or path level protection Line or path level protection node and link disjoint Protection path is node and link disjoint from the primary path. allocated to low-priority traffic Protection path may be allocated to low-priority traffic in the absence of network failure. Faster than dynamic IP rerouting Faster than dynamic IP rerouting Working LSPs have pre-established node/link disjoint protection paths Working LSPs have pre-established node/link disjoint protection paths Physical Data Link Network Transport Session Presentation Application MPLS

62 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 62 Optical layer restoration  Optical layer restoration Real-time or pre-computed Real-time or pre-computed Ring protection or mesh restoration Ring protection or mesh restoration No visibility into higher layer operations. wasteful use of resources May be wasteful use of resources. ring protectionover 100% capacity redundancyFor ring protection, there is over 100% capacity redundancy mesh restoration, 60-80% physical redundancyFor mesh restoration, 60-80% physical redundancy level is typical. Not recommended for node (or software) failures Not recommended for node (or software) failures Faster than higher layer restorations (??) Faster than higher layer restorations (??) Physical DWDM (Optical) Network IP) Transport Session Presentation Application

63 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 63 Multilayer Recovery (1/2)  In a multilayer network it is possible to imagine a situation in which each layer has its own recovery mechanisms.  Not every failure in a particular layer may be resolved in the same layer.  If a failure may be resolved in several layer uncoordinated actions may produce inefficient results  A coordination among the layers is needed !!

64 Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 64 Multilayer Recovery (2/2)  Sequential Approach [1] using an hold-off time a chronological order among the recovery mechanisms adopted in different layer is imposed; alternatively a “token” may used to impose a sequential order among the different layers.  Integrated Approach [1] there is a recovery scheme that has a full overview of all the layers; the recovery scheme may decide when and in which layer (layers) the recovery actions must be taken. [1] D. Colle, et all., “Data-centric optical networks and their survivability”, Selected Areas in Communications, IEEE Journal on Volume 20, Issue 1, Jan. 2002 Page(s):6 - 20


Download ppt "Dipartimento di Informatica e Sistemistica, University of Napoli Federico II – Comics Group 1 An introduction to NETWORK RESILIENCY Giorgio Ventre & Stefano."

Similar presentations


Ads by Google