Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks Lili Qiu, Paramvir Bahl, Ananth Rao, and Lidong Zhou Microsoft Research Presented.

Slides:



Advertisements
Similar presentations
Mitigating Routing Misbehavior in Mobile Ad-Hoc Networks Reference: Mitigating Routing Misbehavior in Mobile Ad Hoc Networks, Sergio Marti, T.J. Giuli,
Advertisements

1 An Approach to Real-Time Support in Ad Hoc Wireless Networks Mark Gleeson Distributed Systems Group Dept.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Troubleshooting Wireless Mesh Networks Victor Bahl joint work with Lili Qiu, Ananth Rao (UCB) & Lidong Zhou Microsoft Research April.
Presented at ICC 2012 – Wireless Network Symposium – June 14 th 2012.
Priority Queuing Achieving Flow ‘Fairness’ in Wireless Networks Thomas Shen Prof. K.C. Wang SURE 2005.
Edith C. H. Ngai1, Jiangchuan Liu2, and Michael R. Lyu1
Monday, June 01, 2015 ARRIVE: Algorithm for Robust Routing in Volatile Environments 1 NEST Retreat, Lake Tahoe, June
1 Estimation of Link Interference in Static Multi-hop Wireless Networks Jitendra Padhye, Sharad Agarwal, Venkat Padmanabhan, Lili Qiu, Ananth Rao, Brian.
1/24 Passive Interference Measurement in Wireless Sensor Networks Shucheng Liu 1,2, Guoliang Xing 3, Hongwei Zhang 4, Jianping Wang 2, Jun Huang 3, Mo.
MAC Layer (Mis)behaviors Christophe Augier - CSE Summer 2003.
Dynamic Tuning of the IEEE Protocol to Achieve a Theoretical Throughput Limit Frederico Calì, Marco Conti, and Enrico Gregori IEEE/ACM TRANSACTIONS.
1 Fall 2005 Hardware Addressing and Frame Identification Qutaibah Malluhi CSE Department Qatar University.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Traffic Engineering With Traditional IP Routing Protocols
Secure Data Communication in Mobile Ad Hoc Networks Authors: Panagiotis Papadimitratos and Zygmunt J Haas Presented by Sarah Casey Authors: Panagiotis.
Mitigating routing misbehavior in ad hoc networks Mary Baker Departments of Computer Science and.
IEEE OpComm 2006, Berlin, Germany 18. September 2006 A Study of On-Off Attack Models for Wireless Ad Hoc Networks L. Felipe Perrone Dept. of Computer Science.
Measurement and Analysis of Link Quality in Wireless Networks: An Application Perspective V. Kolar, Saquib Razak, P. Mahonen, N. Abu-Ghazaleh Carnegie.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
In-Band Flow Establishment for End-to-End QoS in RDRN Saravanan Radhakrishnan.
Fair Sharing of MAC under TCP in Wireless Ad Hoc Networks Mario Gerla Computer Science Department University of California, Los Angeles Los Angeles, CA.
Robust Topology Control for Indoor Wireless Sensor Networks Greg Hackmann, Octav Chipara, and Chenyang Lu SenSys 2009 S Slides from Greg Hackmann at Washington.
Taming the Underlying Challenges of Reliable Multihop Routing in Sensor Networks.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Do You See What I See (DYSWIS) Aditya Muthyala (am3551) School of Engineering and Applied Science Columbia University, Fall 2011.
Adaptive Self-Configuring Sensor Network Topologies ns-2 simulation & performance analysis Zhenghua Fu Ben Greenstein Petros Zerfos.
The Feasibility of Launching and Detecting Jamming Attacks in Wireless Networks Authors: Wenyuan XU, Wade Trappe, Yanyong Zhang and Timothy Wood Wireless.
Itrat Rasool Quadri ST ID COE-543 Wireless and Mobile Networks
Hamida SEBA - ICPS06 June 26 th -29 th Lyon France 1 ARMP: an Adaptive Routing Protocol for MANETs Hamida SEBA PRISMa Lab. – G2Ap team
VIRTUAL ROUTER Kien A. Hua Data Systems Lab School of EECS University of Central Florida.
Fault Diagnosis System for Wireless Sensor Networks Praharshana Perera Supervisors: Luciana Moreira Sá de Souza Christian Decker.
Enhancing TCP Fairness in Ad Hoc Wireless Networks Using Neighborhood RED Kaixin Xu, Mario Gerla University of California, Los Angeles {xkx,
 Network Segments  NICs  Repeaters  Hubs  Bridges  Switches  Routers and Brouters  Gateways 2.
Improving QoS Support in Mobile Ad Hoc Networks Agenda Motivations Proposed Framework Packet-level FEC Multipath Routing Simulation Results Conclusions.
03/09/2003Helsinki University of Technology1 Overview of Thesis Topic Presented By: Zhao Xuetao.
Computer Networks Performance Metrics. Performance Metrics Outline Generic Performance Metrics Network performance Measures Components of Hop and End-to-End.
Fair Sharing of MAC under TCP in Wireless Ad Hoc Networks Mario Gerla Computer Science Department University of California, Los Angeles Los Angeles, CA.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
MARCH : A Medium Access Control Protocol For Multihop Wireless Ad Hoc Networks 성 백 동
MOJO: A Distributed Physical Layer Anomaly Detection System for WLANs Richard D. Gopaul CSCI 388.
DISCERN: Cooperative Whitespace Scanning in Practical Environments Tarun Bansal, Bo Chen and Prasun Sinha Ohio State Univeristy.
Effects of Multi-Rate in Ad Hoc Wireless Networks
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
S Master’s thesis seminar 8th August 2006 QUALITY OF SERVICE AWARE ROUTING PROTOCOLS IN MOBILE AD HOC NETWORKS Thesis Author: Shan Gong Supervisor:Sven-Gustav.
Probabilistic Coverage in Wireless Sensor Networks Authors : Nadeem Ahmed, Salil S. Kanhere, Sanjay Jha Presenter : Hyeon, Seung-Il.
Versatile Low Power Media Access for Wireless Sensor Networks Sarat Chandra Subramaniam.
KAIS T High-throughput multicast routing metrics in wireless mesh networks Sabyasachi Roy, Dimitrios Koutsonikolas, Saumitra Das, and Y. Charlie Hu ICDCS.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
SenProbe: Path Capacity Estimation in Wireless Sensor Networks Tony Sun, Ling-Jyh Chen, Guang Yang M. Y. Sanadidi, Mario Gerla.
By Naeem Amjad 1.  Challenges  Introduction  Motivation  First Order Radio Model  Proposed Scheme  Simulations And Results  Conclusion 2.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
An Energy Efficient MAC Protocol for Wireless LANs, E.-S. Jung and N.H. Vaidya, INFOCOM 2002, June 2002 吳豐州.
Troubleshooting Mesh Networks Lili Qiu Joint Work with Victor Bahl, Ananth Rao, Lidong Zhou Microsoft Research Mesh Networking Summit 2004.
A Security Framework with Trust Management for Sensor Networks Zhiying Yao, Daeyoung Kim, Insun Lee Information and Communication University (ICU) Kiyoung.
A Bandwidth Scheduling Algorithm Based on Minimum Interference Traffic in Mesh Mode Xu-Yajing, Li-ZhiTao, Zhong-XiuFang and Xu-HuiMin International Conference.
Load Balanced Link Reversal Routing in Mobile Wireless Ad Hoc Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE Department RPI Costas Busch CSCI Department.
2012 1/6 NSDI’08 Harnessing Exposed Terminals in Wireless Networks Mythili Vutukuru, Kyle Jamieson, and Hari Balakrishnan MIT Computer Science and Artificial.
PAC: Perceptive Admission Control for Mobile Wireless Networks Ian D. Chakeres Elizabeth M. Belding-Royer.
Optimization-based Cross-Layer Design in Networked Control Systems Jia Bai, Emeka P. Eyisi Yuan Xue and Xenofon D. Koutsoukos.
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
Wireless sensor and actor networks: research challenges Ian. F. Akyildiz, Ismail H. Kasimoglu
Problem: Internet diagnostics and forensics
Architecture and Algorithms for an IEEE 802
TCP and MAC interplay in Wireless Ad Hoc Networks
Automatic Picking of First Arrivals
High Throughput Route Selection in Multi-Rate Ad Hoc Wireless Networks
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
A Study of On-Off Attack Models for Wireless Ad Hoc Networks
Presentation transcript:

Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks Lili Qiu, Paramvir Bahl, Ananth Rao, and Lidong Zhou Microsoft Research Presented by -Maitreya Natu

Network Management Faulty network … Root cause Faults directory Corrective measure Healthy network

Tasks involved in Network Management Continuously monitoring the functioning Collecting information about the nodes and the links Removing inconsistencies and noise from the reported information Analyzing the information Taking appropriate actions to improve network reliability and performance

Challenges in wireless networks Dynamic and unpredictable topology  link errors due to fluctuating environment conditions  Node mobility Limited capacity  Scarcity of resources Link attacks

Proposed framework Reproduce inside a simulator, the real- world events that took place Use online trace driven simulation to detect faults and analyze the root causes

Network Management … Healthy network Types of faults Network model Faults directory Creating a network model

Network Management Faulty network … Types of faults Network model Detected faults Fault diagnosis Faults directory

Network Management … Types of faults Network model what-if analysis Detected faults Faults directory Corrective measures

Key issues How to Accurately reproduce what happened in the network inside a simulator How to build fault diagnosis on top of a simulator to perform root cause analysis

Accurate modeling Use real traces from the diagnosed network  Removes dependency on generic theoretical models  Captures nuances of the hardware, software and environment of the particular network Collect good quality data  By developing a technique to effectively rule out erroneous data

Fault diagnosis Performance data emitted by trace driven simulation is used as baseline Any significant deviation indicates a potential fault Simulator selectively injects a set of suspected faults and searches a set that most produces the expected performance An efficient algorithm is designed to determine root causes

System Overview simulator Topology changes Traffic Simulator Interference Injection Link RSS Link Load Routing update Faults Directory +/- Expected loss rate Throughput noise Loss rate Throughput noise Error Link/Node failure 1. Receive Cleaned Data2. Drive Simulation 3. Compute Expected Performance 4. Compare Expected & Average Performance 5. Discrepancy Found 6. Search for set of faults that result in best explanation 7. Report the cause of failure

Why Simulation Based Diagnosis? Much better insights into the network behavior than any heuristic or theoretical technique Highly customizable and applies to a large class of networks Ability to perform what-if analysis  Helps to foresee the consequences of a corrective action Recent advances in simulators have made possible their use for real-time analysis

Accurate modeling … Healthy network Types of faults Network model Faults directory

Current network models Bayesian networks to map symptom-fault dependencies Context Free Grammars Correlation Matrix

Can on-line simulations be used as core tool?

Building confidence in simulator accuracy Problem  Hard to accurately model the physical layer and the RF propagation  Traffic demands on the router are hard to predict

Building confidence in simulator accuracy Problem  Hard to accurately model the physical layer and the RF propagation  Traffic demands on the router are hard to predict Solution  “after the fact” simulation  Agents periodically report information about the link conditions and traffic patterns to the link simulators

Simulations when the RF condition of the link is good Modeling the overheads of the protocol stack such as parity bits, MAC-layer back-off, IEEE inter-frame spacing and ACK, and headers. Modeling the contention from flows within the interference and communication ranges.

Simulations with varying received signal strength Throughput matches closely with the simulator’s estimate, when signal quality is good Simulator estimate deviates from real, when signal strength is poor

Why simulation results deviate in case of poor signal strength? Lack of accurate packet loss as a function of packet size, RSS and ambient noise.  Depends on signal processing hardware and the RF antenna within the wireless cards Lack of accurate auto-rate control  Adjustment of sending rate done by WLAN cards based on the transmission conditions

How to model auto-rate control done by WLAN cards? Use Trace driven simulation When auto-rate is in use  Collect the rate at which the wireless card is operating and provide the reported rate to the simulator Otherwise  Data rate is known to the simulator

How to model accurate packet loss as a function of packet-size, RSS and ambient noise? Use offline analysis Calibrate the wireless cards and create a database associating environmental factors with expected performance  E.g., mapping from signal strength and noise to loss rate

Experiment to model the loss rates due to poor signal strength Collect another set of traces  Slowly send out packets  Place packet sniffers near both the sender and the receiver, and derive loss rate from the packet level trace Seed the wireless link in the simulator with a Bernoulli loss rate that matches loss rate with the real traces

Estimated and measured throughput when compensating for the loss rate due to poor signal strength Even though the match is not perfect, its not expected to be a problem, because many routing protocols try to avoid the use of poor quality links Poor quality links are used only when certain parts of mesh network have poor connectivity to the rest of the network In a well-engineered network, not many nodes depend on such bad link for routing Loss rate and the measured throughput do not monotonically decrease with the signal strength due to the effect of auto-rate

Stability of channel conditions How rapidly do channel conditions change and how often a trace should be collected?

Temporal fluctuation in RSS Fluctuation magnitude is not significant Relative quality of signals across different number of walls remain stable

Stability of channel conditions How rapidly do channel conditions change and how often a trace should be collected?  When the environment is generally static, nodes may report only the average and standard deviation of the RSS to the manager every few minutes

Dealing with imperfect data By neighborhood monitoring  Each node reports performance and traffic statistics for its incoming and outgoing links  And for other links in its communication range Possible when node is in promiscuous mode Thus multiple reports are sent for each link Redundant reports can be used to detect inconsistency Find the minimum set of nodes that can explain the inconsistency in the reports

Summary How to accurately model the real behavior?  Solution: Use trace-based simulation Problem: Simulation results are good for strong signals but deviate for bad RF conditions  Need to model the autorate control Use trace-driven data  Need to model the loss rate due to poor signal strength Use offline analysis How often a trace should be collected?  Very little data (average and standard deviation of RSS), at fairly low time granularity, as channels are relatively stable How to deal with imperfect data  By neighborhood monitoring

Fault diagnosis Faulty network … Types of faults Network model Detected faults Faults directory

Current fault diagnosis approaches AI techniques  Rule based systems  Neural networks Model traversing techniques  Dependency graphs  Causality graphs  Bayesian networks

Fault Isolation and Diagnosis Establish the expected performance in the simulation Find difference between expected and observed performance Search over the fault space to detect which set of faults can re-produce performance similar to what has been observed

Collecting data from traces Trace data collection  Network topology Each node reports its neighbor and routing tables  Traffic statistics Each node maintains counters of traffic sent and received from immediate neighbors  Physical medium Each node reports signal strength of wireless links to neighbors  Network performance Includes both the link and end-to-end performance, which can be measured through loss rate, delay, throughputs Focus is on link level performance

Simulating the network performance Traffic load simulation  Link based traffic simulation  Adjust application sending rate to match the observed link-level traffic counts Route simulation  Use actual routes taken by packets as input to the simulator Wireless signal  Use real measurement of signal strength Fault injection  Random packet dropping  External noise sources  MAC misbehavior

Fault diagnosis algorithm General approach Simulator Expected performance Network settings Simulator Observed performance Network settings Faults set How to find ?

How to search the faults efficiently? Different types of faults often change one or few metrics  E.g., random dropping only affects link loss rate Thus use metrics in which observed and expected performance is significantly different, to guide the search

Scenario where faults do not have strong interactions Consider large deviation from expected performance as anomaly Use decision tree to determine the type of fault Fault type determines the metric to quantify performance difference Locate faults by finding the set of nodes and links with large difference between expected and observed performance

Scenario where faults have strong interactions Get the initial diagnosis set from the decision tree algorithm Iteratively refine the fault set  Adjust the magnitudes of faults in the fault set Translate difference in performance into change in faults’ magnitude It maps the impact of a fault into its magnitude Remove fault whose magnitude is too small  Add new faults that can explain large differences between the expected and observed performances Iterate till the change in fault set is negligible

Example scenario

Observed performance Increased loss rate at 1-4 and 1-2 No increase in the sending rate of 1-4, 1-2 No increase in noise experienced by neighbors Inference Increased Sending Rate Increased Noise Increased Loss Too low CW Noise Packet DropNormal Y N Y Y N N

Example scenario Observed performance Increased loss rate at 1-4 and 1-2 No increase in the sending rate of 1-4, 1-2 No increase in noise experienced by neighbors Inference Increased Sending Rate Increased Noise Increased Loss Too low CW Noise Packet DropNormal Y N Y Y N N Packet dropping at node 1

Accuracy of fault diagnosis Correctness of the model  Complete information  Consistent information  Timely information Correctness of the reported symptoms  Right size of the threshold to report a symptom  Difference in the behavior of faults  Timely reporting of symptoms

System implementation Windows XP Agents run on every wireless node and reports information collected on demand Managers collect and analyze information Collected information is cast into performance counters supported by Windows Manager is connected to a backend simulator. Collected information is converted to script to drive the simulation Testbed:  Multihop wireless testbed built using IEEE a cards  Commercially available network sniffer called Airopeek is used for data collection  Native NICs provide rich set of networking information

Evaluation: Data collection overhead Management traffic overheadPerformance of FTP flow with and without data collection No data cleaning: Each link is reported only once With data cleaning: Each link is reported by all observers for consistency check Overhead < 800 bits/s/node Data collection traffic has little effect

Data cleaning effectiveness Higher accuracy with denser networks Higher accuracy with client-server traffic Coverage greater than 80% in all cases Higher accuracy with grid topology Higher coverage when using history

Evaluation: Fault diagnosis Detecting random dropping Detecting external noise Symptom: Significant difference in loss rates in links Less than 20% of fault links are left undetected No-effect faults are faulty links sending less that threshold (250) packets of data Symptom: Significant difference in noise level in nodes Noise sources are correctly identified with at most one or two false positives Inference error in magnitudes of noises is within 4%

Evaluation: Fault diagnosis Detecting MAC misbehavior Detecting combinations of all Symptom: Significant discrepancy in throughput on links Coverage is mostly around 80% or higher False positives within 2

what-if analysis … Types of faults Network model Detected faults Faults directory Corrective measures

What-if analysis DiagnosisTopology Corrective measures

Limitations Limited by accuracy of the simulator Time to detect the faults is acceptable for detecting long term faults but not transient faults Choices of traces to drive the simulation has important implications Focus has only been on faults resulting in different behavior

Conclusion Used trace data for modeling the network Data collection techniques are presented to collect network information and detect a deviation from the expected performance Fault diagnosis algorithm is proposed to detect the root causes of failure A scheme for what-if analysis is proposed to evaluate alternative network configuration for efficient network operation

Future work Validation on a large test-bed Performance analysis in presence of mobility Detecting malicious attacks Diagnosis in presence of incomplete network information More deeply investigating the potential of what-if analysis

References L. Qiu, P. Bahl, A. Rao, L. Zhou, Fault Detection, Isolation, and Diagnosis in Multihop Wireless Networks, Microsoft Technical Report, Microsoft Researh-TR , Dec M. Steinder, A. Sethi, A survey of fault localization techniques in computer networks, Technical Report 2001, CIS Dept., Univ of Delaware, Feb 2001 M. Steinder, Probabilistic inference for diagnosing service failures in communication systems, PhD thesis, Univ. of Delaware, 2003

Questions What is proposed solution to model the throughput when the signal strength is poor? In Table 2, the simulated throughput monotonically decreases with the loss rate while the measured throughput does not. Why? What could be the causes of generation of false positives in the fault diagnosis results? When can the false positive ratio increase?