P2P Distributed Fault Diagnosis for SIP Services Henning Schulzrinne, Kyung-Hwa Kim Dept. of Computer Science, Columbia University, New York, NY Kai Miao.

Slides:



Advertisements
Similar presentations
Fred P. Baker CCIE, CCIP(security), CCSA, MCSE+I, MCSE(2000)
Advertisements

Running SIP behind NAT Dr. Christian Stredicke, snom technology AG Tokyo, Japan, Oct 22 th 2002.
Cs/ee 143 Communication Networks Chapter 6 Internetworking Text: Walrand & Parekh, 2010 Steven Low CMS, EE, Caltech.
P2P Distributed Fault Diagnosis for SIP Services Henning Schulzrinne, Kyung-Hwa Kim Dept. of Computer Science, Columbia University, New York, NY Kai Miao.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
QoS Solutions Confidential 2010 NetQuality Analyzer and QPerf.
MCDST : Supporting Users and Troubleshooting a Microsoft Windows XP Operating System Chapter 13: Troubleshoot TCP/IP.
Lesson 18-Internet Architecture. Overview Internet services. Develop a communications architecture. Design a demilitarized zone. Understand network address.
DYSWIS1 Managing (VoIP) Applications – DYSWIS Henning Schulzrinne Dept. of Computer Science Columbia University July 2005.
Oct MMNS (San Jose) Distributed Self Fault-Diagnosis for SIP Multimedia Applications Kai X. Miao (Intel) Henning Schulzrinne (Columbia U.) Vishal.
Kyung Hwa Kim Henning Schulzrinne Internet Real-Time Lab Columbia University October 2011 Distributed Network.
KYUNG HWA KIM HENNING SCHULZRINNE Internet Real-Time Lab Columbia University June 2011 Distributed Network Fault Diagnosis System DYSWIS (Do You See What.
1 Last Class! Today: r what have we learned? r where is the networking world going? r question and answers r evaluation.
Internet Real Time Laboratory Department of Computer Science Columbia University.
SIMPLEStone – A presence server performance benchmarking standard SIMPLEStone – A presence server performance benchmarking standard Presented by Vishal.
Chapter 23: ARP, ICMP, DHCP IS333 Spring 2015.
1 Review of Important Networking Concepts Introductory material. This slide uses the example from the previous module to review important networking concepts:
VOIP ENGR 475 – Telecommunications Harding University November 16, 2006 Jonathan White.
Support Protocols and Technologies. Topics Filling in the gaps we need to make for IP forwarding work in practice – Getting IP addresses (DHCP) – Mapping.
Data Communications and Networks
CN2668 Routers and Switches Kemtis Kunanuraksapong MSIS with Distinction MCTS, MCDST, MCP, A+
Windows Internet Connection Sharing Dave Eitelbach Program Manager Networking And Communications Microsoft Corporation.
CCNA Introduction to Networking 5.0 Rick Graziani Cabrillo College
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Troubleshooting Your Network Networking for Home and Small Businesses.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Troubleshooting Your Network Networking for Home and Small Businesses.
1 Computer Networks and Internets Spring 2005 Assistant Professor JainShing Liu.
ICMP (Internet Control Message Protocol) Computer Networks By: Saeedeh Zahmatkesh spring.
Do You See What I See (DYSWIS)? or Leveraging end systems to improve network reliability Henning Schulzrinne Dept. of Computer Science Columbia University.
User-Perceived Performance Measurement on the Internet Bill Tice Thomas Hildebrandt CS 6255 November 6, 2003.
1 Automated Fault diagnosis in VoIP 31st March,2006 Vishal Kumar Singh and Henning Schulzrinne.
Network Protocols. Why Protocols?  Rules and procedures to govern communication Some for transferring data Some for transferring data Some for route.
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
1 IP: putting it all together Part 2 G53ACC Chris Greenhalgh.
Cisco – Chapter 11 Routers All You Ever Wanted To Know But Were Afraid to Ask.
© 2002, Cisco Systems, Inc. All rights reserved..
Objectives: Chapter 5: Network/Internet Layer  How Networks are connected Network/Internet Layer Routed Protocols Routing Protocols Autonomous Systems.
Call Control with SIP Brian Elliott, Director of Engineering, NMS.
Module 12: Routing Fundamentals. Routing Overview Configuring Routing and Remote Access as a Router Quality of Service.
COMP1321 Digital Infrastructure Richard Henson February 2014.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 2 Module 9 Basic Router Troubleshooting.
Module 4: Planning, Optimizing, and Troubleshooting DHCP
15-1 Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources.
CHAPTER 3 PLANNING INTERNET CONNECTIVITY. D ETERMINING INTERNET CONNECTIVITY REQUIREMENTS Factors to be considered in internet access strategy: Sufficient.
1 TCP/IP Internetting ä Subnet layer ä Links stations on same subnet ä Often IEEE LAN standards ä PPP for telephone connections ä TCP/IP specifies.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
1 Internet Control Message Protocol (ICMP) Used to send error and control messages. It is a necessary part of the TCP/IP suite. It is above the IP module.
CCNA 2 Week 9 Router Troubleshooting. Copyright © 2005 University of Bolton Topics Routing Table Overview Network Testing Troubleshooting Router Issues.
An analysis of Skype protocol Presented by: Abdul Haleem.
Homework 02 NAT 、 DHCP 、 Firewall 、 Proxy. Computer Center, CS, NCTU 2 Basic Knowledge  DHCP Dynamically assigning IPs to clients  NAT Translating addresses.
Managing Services and Networks Using a Peer-to-peer Approach Henning Schulzrinne (with Vishal Singh and other IRT members) Dept. of Computer Science Columbia.
9: Troubleshooting Your Network
1 Week #5 Routing and NAT Network Overview Configuring Routing Configuring Network Address Translation Troubleshooting Routing and Remote Access.
NetTech Solutions Common Connectivity Problems Lesson Eight.
ERICSON BRANDON M. BASCUG Alternate - REGIONAL NETWORK ADMINISTRATOR HOW TO TROUBLESHOOT TCP/IP CONNECTIVITY.
1 Connectivity with ARP and RARP. 2 There needs to be a mapping between the layer 2 and layer 3 addresses (i.e. IP to Ethernet). Mapping should be dynamic.
+ Routing Concepts 1 st semester Objectives  Describe the primary functions and features of a router.  Explain how routers use information.
1/30/2008 International SIP 2008 (Paris) Peer-to-Peer-based Automatic Fault Diagnosis in VoIP Henning Schulzrinne (Columbia U.) Kai X. Miao (Intel)
KYUNG-HWA KIM HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY DYSWIS.
- 1 - DPNM Review of Important Networking Concepts J. Won-Ki Hong Dept. of Computer Science and Engineering POSTECH Tel:
NT1210 Introduction to Networking
COMP1321 Digital Infrastructure Richard Henson March 2016.
Windows Vista Configuration MCTS : Advanced Networking.
Skype.
Network Processing Systems Design
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
NAT、DHCP、Firewall、FTP、Proxy
Instructor Materials Chapter 9: Testing and Troubleshooting
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Planning and Troubleshooting Routing and Switching
In-network Support for VoIP and Multimedia Applications
Presentation transcript:

P2P Distributed Fault Diagnosis for SIP Services Henning Schulzrinne, Kyung-Hwa Kim Dept. of Computer Science, Columbia University, New York, NY Kai Miao Intel Corporation SIP 2009 (Paris) an update

VoIP quality still lagging Keynote study published November

Circle of blame OS VSP app vendor ISP must be a Windows registry problem  re-install Windows probably packet loss in your Internet connection  reboot your DSL modem must be your software  upgrade probably a gateway fault  choose us as provider

Problems in VoIP systems DNS NAT outbound proxy fails server unreachable NAT drops response STUN server not available no response from DNS server destination proxy fails or unreachable packet loss excessive queuing delay UAS not working

Traditional network management model SNMP X “management from the center”

Old assumptions, now wrong Single provider (enterprise, carrier) –has access to most path elements –professionally managed Problems are hard failures & elements operate correctly –element failures (“link dead”) –substantial packet loss Mostly L2 and L3 elements –switches, routers –rarely APs Problems are specific to a protocol –“IP is not working” Indirect detection –MIB variable vs. actual protocol performance End systems don’t need management –DMI & SNMP never succeeded –each application does its own updates

What’s different about VoIP? Consumer application –no technical knowledge –no sys admin High reliability expectations –“My old $10 phone always just worked” Low margins –one call center call  lose margins for a year Difficulty of remote debugging –Tech support can’t see network conditions or NAT QoS sensitive –my has 10% packet loss if the TV is on… NAT sensitive

Managing the whole protocol stack RTP UDP/TCP IP SIP no route packet loss TCP neg. failure NAT time-out firewall policy protocol problem playout errors media echo gain problems VAD action protocol problem authorization asymmetric conn (NAT) interference collisions DNS DHCP STUN

Types of failures Hard failures –connection attempt fails –no media connection –NAT time-out Soft failures (degradation) –packet loss (bursts) access network? backbone? remote access? –delay (bursts) OS? access networks? –acoustic problems (microphone gain, echo) –a software bug (poor voice quality) protocol stack? Codec? Software framework?

Internet DYSWIS = Do You See What I See? Do you see what I see? End user

DYSWIS Capture packets Detect problem discover probe peers ask peers for probe results diagnose problem NDIS pcap no response packet loss no packets sent same subnet same AS different AS close to destination … reachable? packet loss? indicate likely source of trouble: application own device access link (802.11) NAT local ISP Internet remote server rule engine

DYSWIS overview Detect Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Probe Detect Diagnosis Probe

Diagnosis node Architecture “not working” (notification) inspect protocol requests (DNS, HTTP, RTCP, …) “DNS failure for 15m” orchestrate tests contact others ping can buddy reach our resolver? notify admin ( , IM, SIP events, …) request diagnostics Sensor node

Example rule Rule Example (load-function ExMyUpcase) (load-function SelfDiagnosis) (load-function DnsConnection) (load-function ProxyServer) (load-function SipResult) (defrule MAIN::SIP (declare (auto-focus TRUE)) => (process-sip void) ) (deffunction process-sip (?args) "test dns and proxy server for sip" (bind ?result "NA") (bind ?result (self-diagnosis void)) if (eq ?result "ok") then (bind ?result (dns-connection other)) if (eq ?result "ok") then (bind ?result (proxy-connection void)) (sip-result ?result) ) (deffunction process-dns (?args) "test dns server" (bind ?result "NA") (bind ?result (dns-connection void)) if (eq ?result "ok") then (bind ?result (dns-resolution other)) (sip-result ?result) )

Peer selection DHT or database –Register myself to DHT network AS number, subnet, first hop address, access point –Search probing nodes Nodes on LAN and beyond A B I need some nodes who can help me. Who is in same subnet with me? You can contact to B. His IP address is and port number is 9090 DHT

Peer selection - DHT (key, value) A B I need some nodes who can help me. Who is in same subnet with me? DHT node /16 node udp node /24 no node kkh.cs.columbia.edu 9090 tcp

Remote probing Distributing modules –Detecting and probing modules should be added and updated –Dynamic class loading –Dynamic module distributing Modules can be created and updated separately. XMLRPC

Probing Scenarios HTTP –Causes: Dead web-server, page moved, low bandwidth, … Check DNS query TCP connection Ask other node to try same query Check TCP congestion (packet loss) … DNS –Causes: Dead DNS server, resolution failed, UDP is not working, … Check other DNS server Ask other node to try to connect my DNS server Ask other node to query same host to another DNS server SIP/RTP –Causes: NAT, DNS, proxy server, authentication, … Proxy connectivity test (SIP OPTION) Ask other node to try same action …

Implementation

Probing bundle 1 Probing bundle 2 Probing bundle 3 DYSWIS Main Bundle poll Update polling bundle Felix launcher Implementation using Felix Need to update polling and other functions “dynamic service deployment framework amenable to remote management”

Implementation: system tray

Implementation: debugger

Implementation: fault history

Implementation: traceroute

Summary Problems in VoIP applications particularly hard to diagnose –cost-sensitive consumer application –multiple interlocking protocols –NATs and firewalls –QoS-sensitive Existing management systems not useful DYSWIS – distributed diagnostics using peers –generic infrastructure: probes & rules Applications should assist in debugging –“hey, DYSWIS, I got a problem!”