Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Network Management: Graphical Models for Fault Location Ricardo Morla INESC Porto / FEUP.

Similar presentations


Presentation on theme: "Automatic Network Management: Graphical Models for Fault Location Ricardo Morla INESC Porto / FEUP."— Presentation transcript:

1 Automatic Network Management: Graphical Models for Fault Location Ricardo Morla INESC Porto / FEUP

2 Motivation: Managing ICT networks  ICT networks are large and heterogeneous  Desktops, servers, applications, services, databases, sensors, routers, links, messages  Multiple vendors and goals  Non-deterministic  Exponential back-off, concurrency, user interaction  Difficult to manage  Configuration options  No single person with unified view of network  Log data  Expensive to manage  Rule of thumb  €1 acquisition :: €2 operations and management

3 Failures, Faults, and Fault Location  Failures  Request timeout, anomalous values, …  Faults  Hardware fault, software bugs, mis-configurations, …  Cannot monitor faults  infer faults from failures  Single component: trivial to locate fault  Network: non-trivial  May not be able to monitor all failures  Failures cause failures in other components

4 Toy example – Fault location – I  Simple adding example  C1.out=C1.in+1  C2.out=C2.in+1  Cannot read C1.out/C2.in  What’s the faulty component?  C1.in = 10  C2.out = 13 C1 or C2? C1C2

5 Toy example – Fault location – I  What’s the faulty component?  C1.in = 10  C2.out = 13  C1.out = C1.in + 1 (99.9%)  C1.out = C1.in + 2 (0.01%)  C2.out = C2.in + 1 (99.9%)  C2.out = C2.in + 10 (0.01%) C1 or C2? C1C2

6 Toy example – Fault location – II  Message forwarding example  Fault: message drop  Fault Propagation Model  Fault in component A  Failure in component A A A A BC B A-B B C C B-C A A

7 The fault location problem in ICT  Motivating example (I)  Fiber-based IP network  Faults: Fiber/Splitter cuts  Failures: loss of connectivity between IP nodes  Map IP topology (routers etc) with fiber topology  IP links (failures) share fiber faults  Shared risk [Kompella05]  Smallest best possible fault explanation for observed failures

8 The fault location problem in ICT  Motivating example (II)  IMS Networks  Complex architecture  Session Director, SIP Server; Home subscriber server  Distributed geographically, tree-based  Various software and hardware faults  Multimedia-specific KPI and failures/alarms  Codebook approach [Reali09]  Minimum set of alarms  Robustness against spurious/missing alarms

9 The fault location problem in ICT  Motivating example (III)  Enterprise Networks  [Kandula09]  Symptoms  Intermittent response time from server  DB server refusing to start  Faults  Configuration  Software bugs  Difficult to get topology info and dependencies

10 PGM for fault location  Graphical model encodes  P(Fault | Failures)  What’s the most likely set of faults that explains a set of given observed values?  Posterior probability  Highest P(Failures | Fault)  This is hard:  topology of PGM  detail of probabilistic model

11 Challenges  Define fault location models  In addition to FPM  Higher model complexity in the PGM  Include time functions  Adequate models of ICT systems for Fault Location  From topology/application domain  Automatically from data  Hybrid  Fault location-based system redesign/reconfiguration  Performance metrics vs. fault location metrics  Tradeoff

12 Current effort  Modeling different ICT systems for better fault location  Enterprise networks  IP Multimedia Subsystems  Ambient intelligent environments  …  How:  From network topology  Directly from data  With expert input (a-priori rules)

13 Concluding Remarks  ICT systems are increasingly complex  We must be able to manage them automatically including locating faults  Automatic fault location has the potential for cutting operations and management costs  Applicable world-wide, across market domains


Download ppt "Automatic Network Management: Graphical Models for Fault Location Ricardo Morla INESC Porto / FEUP."

Similar presentations


Ads by Google