Understanding Network Failures 1.0 Understanding Network Failures (program overview) 2.0 Intro to ping 2.1 Usage Intro (Strybd prototype) 2.2 Lab 2.3 Assessment Plus addt’l e-Learning Modules, labs and assessments: 3.0 Intro to traceroute, lab, assessment 4.0 Intro to netstat, lab, assessment 5.0 Intro to ipconfig (ifconfig), lab, assessment 6.0 Intro to nslookup, lab, assessment 7.0 Intro to whois, lab, assessment 8.0 Pre-Assessment Modules (pre-tests for each module) 9.0 Assessment Modules 10.0 Labs: User Tools & Network Utilities (telnet short-cuts, PCHAR/TTCP,…?)
2.1 Lesson Objectives In this lesson, students will utilize “ ping ” to validate network connections, and analyze responses reported from “ ping ” Audience information: –Call Center I & II/CCNA I & II –20 Minutes (duration) 2.1.1
User: s0s1 e0 Center EvaBoaz e0 s0 6543 1 e0 s2 Server 1 2 Customer Support: CS: User: Network failures: The sky is falling! Network failures: The sky is falling! “Becky” The Internet Becky ? 2.1.2
Policy change or local failure? –Do the interfaces show a link light? Before escalating this call... For most users: The browser is “The Internet”... the sky isn’t falling! –LAN/WAN connectivity? ( # ping yahoo.com ) Example: Text messages are being dropped by “Boaz” router ? 2.1.3
–Does the interface show a link light? Review: Before escalating a customer call... Consider local failures first! Identify recent (local) modifications The browser is “The Internet” () The browser is “The Internet” ( for most users ) –Are new patches applied? Applied correctly?
Many local network “failures” are due to operator error Experience suggests... –Un-skilled users, un-trained personnel, invalid configurations... Suspect recent changes or modifications –Have all required patches been applied correctly? –Check the logs ( recent activity? upgrades? )
Circuit “ outages ” are a common cause of real (actual) network faults –Example: Heavy equipment workers & sea dredging have cut cabling, power lines, deep sea fibre ( very rare!) 1.0 (Review) : Common Causes of Network Failures DoS Attacks = Sluggish network segments For our example, the Internet is down! Example: “ ping ” may be used to verify all subnets “up” during DoS attack Alert: s2 is “down”! Status: ( ping or traceroute script ) –All Routers and sub-nets “up” (reachable), except.. –Center-s2 (Serial_2) “unreachable” during attack ─Example: Denial of Service ( DoS): More common...? 2.1.4 (1.0)
Eva 653 Server 1 s2 # ping 220.127.116.11 Echo Request: Echo Reply: How many intervening devices, as shown? WS4 192.168.10.62 Center Boaz 2 e0 s0 s1 e0 s0 e0 What if this ping fails? Reduce scope of test... Center-sw1 Boaz-sw1 Sw1-8 Sw1-2 2.1.5 Round-trip: A Request/Reply “pair”
2 Serial_0 s1 EvaBoaz e0 s0 6543 Server 1 e0 s2 Example: Using ping Initial troubleshooting # ping <> ( e.g. ping ) # ping ( e.g. ping ) Demonstration: “ ping Serial_0 ” # ping 192.168.10.65 Type to abort. Sending 5, 100-byte ICMP Echos to 192.168.10.65, timeout is 2 seconds: ! Success rate is 100 percent (5/5), round-trip Min/Avg/Max = 4/6/9 ms !!!! Center e0 Serial_0 2.1.6
ping uses ICMP Echo Request/Reply ICMP Message types: –EchoRequest/EchoReply: “ ping” connectivity –Dest unreachable: Packet delivery problem –Time exceeded: Packet discarded (TTL) –Redirect: Better route via “ router_ip_address ” Using “ ping ” continued... There are many ways to utilize “ ping ”...
–Specify data length, source and dest. addresses Extended “ ping ” (options) –Specify “next hop” –Set timeout interval (default: 2 seconds) –Specify ping count (repeated ping attempts) –Specify data pattern (sliding “1s”, or 0xABCD) –Validate response data (data validity) –Set: Don’t Fragment, include Timestamp, etc
“ My internet is down ” could be a sluggish network segment, slow server, or equipment fault... ? “ My internet is down ” could be a sluggish network segment, slow server, or equipment fault... ? –How many intervening devices? (firewall, appliance, proxy server, CSU/DSU, …) –Is it a recurring fault or temporary slowness or random outages? Initial Network Tests Collecting accurate failure data is crucial!
–Could be an intervening application server, device or appliance Review: Initial Network Tests: What to consider? User: “My internet is down...” “ ping yahoo.com ” = “Are you there?” –Intermittent faults may appear as temporary service outages ( e.g. threshold exceeded, server rebooting,... )
Standard diagnostics using “ ping ”: # ping 127.0.0.1 ping : Validate Connectivity # ping # ping
What is a 20% success rate? # ping 192.168.10.62 Type to abort. Sending 5, 100-byte ICMP Echoes to 192.168.10.62 Success rate is 20 percent (1/5), round-trip _min/avg/max = 76/76/76 ms timeout is 2 seconds: ECHO Request (from WS2): ECHO Reply (to WS2): p ing responses : (.) = timeout, (!) = success, (N) = Net-Unreachable, (U) = Dest-Unreachable....! 2.1.7
–“Are you there?” ( ECHO Request sent from source ) “ ping 192.168.10.65 ” will validate network connectivity ( between source and destination ) –“I am connected” ( ECHO Reply received from destination ) –5 of 5 packets = 100% success rate See, also, www.cwdotson.com/NetFailures,dd2 www.cwdotson.com/NetFailures,dd2 Review: Using ping
Recall the ping responses: An exclamation (!) indicates which test result? A) Failure; B) Success; C) Time out Questions: Using ping Recall the ping responses, and exclamation (.) indicates: A) Failure; B) Success; C) Time out (True/False) Ping is an excellent performance monitor (True/False) 2 of 5 successful packets indicates a success rate of 20% False (40% success) False (40% success) 2.1.8 (True/False) When ping is executed, the source issues an Echo Request to the destination. B) Success C) Timeout False True
Intermittent faults: Difficult to identify & fix –Occasional errors (“Time exceeded”) Intermittent Vs. Recurring Failures –Errors may occur only under certain conditions (e.g. temporary outages, threshold exceeded) Recurring faults: Easier to identify (Server, router, or interface is “down”) –Chronic fault (“Network unreachable”)
Limitations of “ ping ” # ping yahoo.com Type to abort. Sending 5, 100-byte ICMP Echos to 18.104.22.168 ! ! ! ! ! timeout is 2 seconds: ! ! ! ! ! Success rate is 100 percent (5/5), round-trip _ min/avg/max = 23/26/29 ms ping can validate “ connectivity ” only! –“100%” success expected! –ICMP packets do NOT represent “real world” traffic –ping : Response is for few, small pkts Caution: ping is a poor tool for performance monitoring! –Network performance varies for ”real world” traffic –Text traffic is much different than streaming video or VoIP
–For small, idle networks 100% success rates are common (not “real world”) Review: ping limitations ping : Validates network paths –Sends a few, small packets (e.g. 100-byte, ICMP packets are not “real world” traffic) Only confirms basic connectivity between remote nodes
In this lesson we: Lesson Summary –Examined LAN/WAN failures ( DoS, circuit breaks ) –Used “ ping ” to validate a network connection with remote nodes –Examined responses reported by “ ping ” to analyze network performance 2.1.9
Hosts/Routers return “Dest. Unreachable” when: Data cannot be completely delivered to receiving application at the destination host –Example: ICMP messages sent back to WS2 is reponse to “ ping ” (e.g. # ping serial_0 ) Destination Unreachable –Network unreachable: No matching route –Host unreachable: packet is routable but host not responding –Can’t fragment: Older router/Large pkts (must fragmnt but “do not frag” bit set) –Protocol unreachable: Transport layer protocol “down” at host –Port unreachable: Host application fault (port un-opened by app)
Use ping to trace a path ( identify “last” router ) telnet to last last “traced” router or node # telnet # telnet Isolating IP Routing Problems: