Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cisco Catalyst 6500 IOS Update

Similar presentations


Presentation on theme: "Cisco Catalyst 6500 IOS Update"— Presentation transcript:

1 Cisco Catalyst 6500 IOS Update
Chew Kin Pheng, Systems Engineer

2 Agenda Introduction Embedded Event Monitoring (EEM)
Generic Online Diagnostics (GOLD) Smart Call Home (SCH) Global Balancing Protocol (GLBP)

3 NEW 12.2(33)SXH Software SHIPPING!
Wiring Closet Backbone Data Center EWAN Metro Unified Network Services Non-Stop Communication Operational Manageability Virtualization Application Intelligence Integrated Security SP Network LLDP-MED NAC Integration IPv6 Innovations 16 port 10G linecard VS-S720-10G IPsec Leadership Multicast VPN Inter-AS and Extranet LLDP-MED NAC Integration 12.2(33)SXH Software SHIPPING! IOS Modularity GOLD CPP Enhanced Object Tracking HSRP and GLBP SSO 16-way Loadbalancing Fast Fabric Switchover IOS Software Modularity BFD with BGP MPLS HA MPLS FRR link and Node protection Multiplexed UNI 200+ Features with Full IOS Software Modularity Smart Call Home Smart-Ports AutoSecure Multiple SPAN Enhancements EEM Smart Call-Home EEM IP SLA Smart Call-Home E-OAM (802.1ag and 802.3ah) MPLS MIBs CatOS to IOS Transition Release Multi-VRF with Multicast 802.1x, MAC Auth, Web Auth for Access Control Smart Call Home Smart-Ports, AutoQoS, AutoSecure VRF Aware Services L2, L3 VPN Innovations MPLS (L2, L3VPN, TE) Innovations VRF Aware Services Private Hosts Major Security Enhancements (IBNS, 802.1x etc) Virtual Switching & L2 Scalability Innovations NBAR on PISA AutoQoS Per interface NDE NetFlow Top Talkers Multcast NDE NetFlow Top Talkers Per interface NDE Sophisticated QOS support with LLQ, cRTP, LFI, MLPPP Sophisticated QOS support for optimized Triple Play services Continued End-To-End Leadership FPM on PISA CIST, NAC, IBNS Solution Integration Policy-Based ACLs IGMP Filtering Policy-Based ACLs Multicast Router Guard 16K IPSec tunnels DMVPN support in HW Layer 3 NAC Address Spoofing Prevention CoPP

4 Embedded Event Management (EEM) Overview

5 EEM – What is it? Embedded Event Manager (EEM) is a programmable subsystem that is present in the IOS that runs on the Catalyst 6500 It allows Network Administrators to automate responses to specific events that occur on the switch Simplified Operation - Embedded Event Manager provides a means to automate the operational management in real time - EEM monitors for specific events on the switch and can invoke pre defined actions to correct, take remedial action and report the event to network operations…

6 EEM - How does it work?

7 EEM Basic Architecture

8 EEM - Examples of its Use?

9 EEM - Examples of its Use?

10 Catalyst 6500 Management Simplified Operation - EEM Example
Automate switch configuration for connected IP phones

11 EEM - The Hardware and Software it works with?

12 Generic Online Diagnostics For The Catalyst 6500

13 Generic Online Diagnostics What is GOLD?
GOLD defines a common framework for diagnostics operations across Cisco platforms running Cisco IOS Software. Goal: check the health of hardware components and verify proper operation of the system data plane and control plane at run-time and boot-time. Provides a common CLI and scheduling for field diagnostics including : Bootup tests (includes online insertion) Health monitoring tests (background non-disruptive) On-Demand tests (disruptive and non-disruptive) User scheduled tests (disruptive and non-disruptive) CLI access to data via management interface

14 How Is GOLD Different From Other Forms Diagnostics?
GOLD performs functional tests typically using diagnostic packets switching through the system, as well ASIC memory testing Can be performed during runtime Typically use the same hardware path and IOS software drivers and user traffic Power On Self Test (POST) occurs early on in the IOS initialization Focused on the CPU subsystem and memory components

15 Cat6K Online Diagnostic Methodology
Boot-up diagnostics touch every single ASIC/memory device in the data path and control path. Perform Functional Testing combined with components monitoring to detect fault in passive components (connector, solder joint etc.) and active components (ASICs, PLDs etc.). Tests are written using run-time driver routines to catch SW defects. Non-disruptive tests are used as HA triggers. Both disruptive and non-disruptive tests are available on-demand as trouble shooting tools for CA/TAC. Root cause analysis and corrective actions are performed upon test failure. EEM will be used for configurable corrective action. (Tcl based) Boot up diag takes less than 10 seconds per module in complete mode and in minimal mode it is about 5-7 seconds. Functional testing – packets sent out during testing will touch active components and also passive components so in effect both are tested. Using the run-time driver routines enables us to catch software defects and stay as close to the exact path the actual customer traffic packet will traverse. RCA is hard coded and the test suite will try to co-relation between the results of various tests. For example, if we see that the loop back test is failing on every single line card in the system, we try not suspect that all line cards are bad, but potentially the suspect is the supervisor module sending out the diagnostic packets.

16 Generic Online Diagnostics How does GOLD work?
Diagnostic packet switching tests verify that the system is operating correctly: Is the supervisor control plane and forwarding plane functioning properly? Is the standby supervisor ready to take over? Are linecards forwarding packets properly? Are all ports working? Is the backplane connection working? Other types of diagnostics tests including memory and error correlation tests are also available Forwarding Engine Linecard Fabric Forwarding Engine CPU Active Supervisor Standby Supervisor Linecard

17 Generic Online Diagnostics What type of failure does GOLD detect?
Diagnostics capabilities built in hardware Depending on hardware, GOLD can catch: Port Failure Bent backplane connector Bad fabric connection Malfunctioning Forwarding engines Stuck Control Plane Bad memory

18 Generic Online Diagnostics Diagnostic Integration
Configuration/reporting Action Boot-up diagnostics Runtime diagnostics Default corrective action Supervisor reset Supervisor switch-over Fabric switch-over Port shut down Line card reset Line card power down Generate a call-home message Trigger Syslog Trigger EEM policies Generate SNMP Trap On-demand Configure online diagnostics and check diagnostics results Scheduled Health-monitoring Provides generic diagnostics framework Automated action based on diagnostics results Verify hardware functionalities Detect and identify problems before they result in network downtime!

19 Generic Online Diagnostics Diagnostic Operation
Boot-Up Diagnostics Run During System Bootup, Line Card OIR or Supervisor Switchover Makes Sure Faulty Hardware Is Taken out of Service Switch(config)#diagnostic bootup level complete Runtime Diagnostics Health-Monitoring Non-Disruptive Tests Run in the Background Serves as HA Trigger Switch(config)#diagnostic monitor module 5 test 2 Switch(config)#diagnostic monitor interval module 5 test 2 00:00:15 On-Demand Switch#diagnostic start module 4 test 8 Module 4: Running test(s) 8 may disrupt normal system operation Do you want to continue? [no]: y Switch#diagnostic stop module 4 All Diagnostics Tests Can Be Run on Demand, for Troubleshooting Purposes. It Can Also Be Used As A Pre-deployment Tool You can see here the multiple types of diagnostics available in the GOLD framework. Each of these serve a particular purpose. The first thing to notice is that some diagnostics run during runtime, while other run during bootup time. The other important point here is that only Health-monitoring tests are non-disruptive – other tests will affect the system if executed during runtime. With bootup diagnostics, the goal is to ensure faulty hardware is taken out of service before coming online. Boot up diagnostics takes less than 10 seconds per module in ‘complete’ mode and in the ‘minimal’ mode it is about 5-7 seconds per module. Wherever possible, run the boot up tests in “complete” mode. The difference in timing is about 2-3 seconds per card in the chassis, but the trade-off is with higher availability. Health monitoring tests are non-disruptive and serve as HA trigger during runtime. On-demand tests on the other hand are usually disruptive. Tests may be disruptive for subseconds or they may take hours to complete, in which case the user will be warned. Bootup tests are only a subset of the available on-demand tests. They can be used to troubleshoot a given system if a hardware problem is suspected or it can be used as a pre-deployment tool to ensure that the system is working correctly before being put into production Scheduled tests can be any GOLD test that we need to execute during a specific time or outage window. Scheduled Switch(config)#diagnostic schedule module 4 test 1 port 3 on Jan :32 Switch(config)#diagnostic schedule module 4 test 2 daily 14:45 Schedule Diagnostics Tests, for Verification and Troubleshooting Purposes

20 Generic Online Diagnostics View the GOLD Tests and Attributes
Switch#show diagnostic content mod 5 Module 5: Supervisor Engine 720 (Active) <snip> Testing Interval ID Test Name Attributes (day hh:mm:ss.ms) ==== ================================== ============ ================= 1) TestScratchRegister > ***N****A*** :00:30.00 2) TestSPRPInbandPing > ***N****A*** :00:15.00 3) TestTransceiverIntegrity > **PD****I*** not configured 4) TestActiveToStandbyLoopback -----> M*PDS***I*** not configured 5) TestLoopback > M*PD****I*** not configured 6) TestNewIndexLearn > M**N****I*** not configured 7) TestDontConditionalLearn > M**N****I*** not configured 8) TestBadBpduTrap > M**D****I*** not configured 9) TestMatchCapture > M**D****I*** not configured 10) TestProtocolMatchChannel > M**D****I*** not configured 11) TestFibDevices > M**N****I*** not configured 12) TestIPv4FibShortcut > M**N****I*** not configured 13) TestL3Capture > M**N****I*** not configured 14) TestIPv6FibShortcut > M**N****I*** not configured 15) TestMPLSFibShortcut > M**N****I*** not configured 16) TestNATFibShortcut > M**N****I*** not configured 17) TestAclPermit > M**N****I*** not configured 18) TestAclDeny > M**N****A*** :00:05.00 19) TestQoSTcam > M**D****I*** not configured Diagnostics test suite attributes: M/C/* - Minimal bootup level test / Complete bootup level test / NA B/* - Basic ondemand test / NA P/V/* - Per port test / Per device test / NA D/N/* - Disruptive test / Non-disruptive test / NA S/* - Only applicable to standby unit / NA X/* - Not a health monitoring test / NA F/* - Fixed monitoring interval test / NA E/* - Always enabled monitoring test / NA A/I - Monitoring is active / Monitoring is inactive R/* - Power-down line cards and need reset supervisor / NA K/* - Require resetting the line card after the test has completed / NA T/* - Shut down all ports and need reset supervisor / NA Here is an example of diagnostics content for a Supervisor 720 for your reference. Each test has specific attributes and an “N” will denote a non-disruptive health monitoring test. You can see here for test 1 and 2 the associated monitoring interval.

21 Generic Online Diagnostics GOLD Test Attributes (Con’t)
20) TestL3VlanMet > M**N****I*** not configured n/a 21) TestIngressSpan > M**N****I*** not configured n/a 22) TestEgressSpan > M**D****I*** not configured n/a 23) TestNetflowInlineRewrite > C*PD****I*** not configured n/a 24) TestFabricSnakeForward > M**N****I*** not configured n/a 25) TestFabricSnakeBackward > M**N****I*** not configured n/a 26) TestTrafficStress > ***D****I**T not configured n/a 27) TestFibTcamSSRAM > ***D*X**IR** not configured n/a 28) TestAsicMemory > ***D*X**IR** not configured n/a 29) TestNetflowTcam > ***D*X**IR** not configured n/a 30) ScheduleSwitchover > ***D****I*** not configured n/a 31) TestFirmwareDiagStatus > M**N****I*** not configured n/a 32) TestAsicSync > ***N****A*** :00: Diagnostics test suite attributes: M/C/* - Minimal bootup level test / Complete bootup level test / NA B/* - Basic ondemand test / NA P/V/* - Per port test / Per device test / NA D/N/* - Disruptive test / Non-disruptive test / NA S/* - Only applicable to standby unit / NA X/* - Not a health monitoring test / NA F/* - Fixed monitoring interval test / NA E/* - Always enabled monitoring test / NA A/I - Monitoring is active / Monitoring is inactive R/* - Power-down line cards and need reset supervisor / NA K/* - Require resetting the line card after the test has completed / NA T/* - Shut down all ports and need reset supervisor / NA This is the same output continued on this new slide. I want to point out here some Disruptive test which are marked with the “D” attribute. Also, you can see that these are the VERY disruptive tests I was talking about earlier that can take hours to complete such as TCAM tests. Be very careful to only run these tests when required or during pre-stage testing. You can see in the attributes here an “X” and a “R” as well, which denotes “not a HM test” and a required linecard or supervisor reset after this test is executed. TrafficStressTest: The traffic stress tests the system by configuring all of the ports under test into pair which can circulate packets between each other and then sending data packets to these pairs where they circulate for the duration of the test. After allowing the packets to circulate through the system for a while, the traffic stress test takes all packets back to inband port to verify if any packets have problems. If any packets have been dropped, misdirected or corrupted, the test fails. Pay Extra Attention to Memory Tests: Memory Tests Can Take Hours to Complete and a Reset Is Required After Running These Tests

22 Generic Online Diagnostics An example: Supervisor datapath coverage
PFC3 L3/4 Engine MSFC Port ASIC RP CPU SP CPU DBUS RBUS 16 Gbps Bus EOBC L2 Engine Fabric Interface/ Replication Engine Switch Fabric Monitors forwarding path between the Switch Processor, Route Processor and Forwarding Engine Runs Periodically every 15 Seconds after System is Online (Configurable) 10 Consecutive Failures is treated as FATAL and will result in supervisor switchover or supervisor reset SPRP INBAND PING This test here detects most runtime software driver and hardware problems on supervisor engines It covers the Layer 2, Layer 3 and 4 forwarding engine, and the replication engine on the path from the switch processor to the route processor. We send a L2 packet from the Switch Processor (SP) to the Route Processor (RP) . When the RP receives this, it sends out a L3 packet and forward it to the rewrite and multicast engine. This will then do a rewrite on the packet and send it back to the SP. 10 consecutive failure will result in supervisor failover or reset You have an example here on how to configure that Health monitoring test – The main tests such as this one are already on by default, every 15 sec. but you get the ability to change monitoring interval of each test (365 days down to 50 milliseconds granularity) if you want Health-Monitoring info is stored in the switch configuration Health-Monitoring is SSO-compliant. Upon switchover, health-monitoring tests will run from new active seamlessly Switch(config)#diagnostic monitor module 5 test 2 Switch(config)#diagnostic monitor interval module 5 test 2 00:00:15

23 Generic Online Diagnostics View GOLD Results
Switch#show diagnostic result mod 7 Current bootup diagnostic level: complete Module 7: CEF port 1000mb SFP Overall Diagnostic Result for Module 7 : MINOR ERROR Diagnostic level at card bootup: complete Test results: (. = Pass, F = Fail, U = Untested) 1) TestTransceiverIntegrity: Port U U . U U U U U U U U U U U U U U U 2) TestLoopback: F 3) TestScratchRegister > . 4) TestSynchedFabChannel > . <snip> Another diagnostics output, this time, showing a diagnostics result on a module. We can see this module experienced a minor error and the culprit seems to be port 13 on the linecard since it is marked a “F” for failed

24 GOLD Operation Example
GOLD generic Syslog messages start with the string “DIAG”; CONST_DIAG” messages platform specific… Bootup Test Failure: %CONST_DIAG-SP-3-BOOTUP_TEST_FAIL: Module 2: TestL3VlanMet failed Health Monitoring Test Failure: %CONST_DIAG-SP-3-HM_TEST_FAIL: Module 5 TestSPRPInbandPing consecutive failure count:10 %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=3% RP=12% Traffic=0% %CONST_DIAG-SP-4-HM_TEST_WARNING: Sup switchover will occur after 10 consecutive failures On Demand Diagnostics Test Failure: %DIAG-SP-3-TEST_FAIL: Module 5: TestTrafficStress{ID=24} has failed. Error code = 0x1 Scheduled Diagnostics Test Failure: %DIAG-SP-3-TEST_FAIL: Module 3: TestLoopback{ID=1} has failed. Error code = 0x1 Generic Minor and Major Failure: %DIAG-SP-3-MINOR: Module 3: Online Diagnostics detected a Minor Error. Please use 'show diagnostic result <target>' to see test results. %DIAG-SP-3-MAJOR: Module 6: Online Diagnostics detected a Major Error. Please use 'show diagnostic Module 6' to see test results. Now generic Syslog messages generated by GOLD will start with the string diag or const_diag. Const is Constellation, that was the code name for the We try to get away from leaking out code names but we still put it in our Syslog messages, so go figure. But you can see here, for example, a boot up test failure. You'll get const_diag blah, blah, blah, health monitoring test failure, const_diag, blah, blah, blah. On demand it will be just a diag but it's going to depend on the test as well. So just an example of what the Syslog message is. Now this is where EEM can come into play, right, because if I have a diag fail, well, I can have my little EEM -- I can have my EEM script monitoring the GOLD event detector to look, or I can just monitor the Syslog event detector and look for anything that has const, diag and fail in it. And at that point I can then take some action, alert somebody, etcetera.

25 Reducing Downtime Thru Automation GOLD Integration With EEM and Call Home
Automates problem diagnosis and information gathering EEM applets and scripts can initiate GOLD tests Automates corrective actions and notifications GOLD events can trigger EEM scripts Beginning in release 12.2(33)SXH GOLD corrective actions are configured via EEM scripts Automates result notification GOLD events are monitored by Call Home diagnostics profile group Configure User Policies Gather Information & Diagnose Known Issues Take Corrective Actions Dispatch & Repair

26 Embedded Event Manager Supports Event Detector for GOLD
Core1# show event manager policy register detail Mandatory.go_unusedportlpbk.tcl ::cisco::eem::event_register_gold card all testing_type monitoring test_name TestUnusedPortLoopback action_notify TRUE consecutive_f ailure 10 platform_action 0 queue_priority last # # GOLD TestUnusedPortLoopback Test TCL script # April 2006, Sifang Li # Copyright (c) by cisco Systems, Inc. # All rights reserved. # Register for TestUnusedPortLoopback test event # the elements for register the event # card [all | card #] # sub_card [all | sub_card #] # severity_major | severity_minor | severity_normal default : severity_normal # new_failure [true | false] default: dont_care # testing_type [ondemand | schedule | monitoring] # test_name [ test name ] # test_id [ test # ] # consecutive_failure [ consecutive_failure # ] # platform_action [action_flag] # action_flag [ 0 | 1 | 2 ] # queue_priority [ normal | low | high | last] default: normal #.... EEM can be used to track and perform corrective actions for GOLD Beginning in release 12.2(33)SXH all GOLD corrective actions are scripted using EEM

27 Call Home Service Monitors GOLD Status
Automates the notification process Allows customization via profiles Severity levels Who gets notified Which transport method Initially supported in IOS 12.2(33)SXH call-home alert-group configuration alert-group diagnostic alert-group environment alert-group inventory alert-group syslog profile "CiscoTAC-1" no active no destination transport-method http destination transport-method destination address destination address http subscribe-to-alert-group diagnostic severity minor subscribe-to-alert-group environment severity minor subscribe-to-alert-group syslog severity major pattern ".*" subscribe-to-alert-group configuration periodic monthly 8 16:34 subscribe-to-alert-group inventory periodic monthly 8 16:19

28 Generic Online Diagnostics Recommendations
Bootup diagnostics: Set level to complete On demand diagnostics: Use as a pre-deployment tool: run complete diagnostics before putting hardware into production environment Use as a troubleshooting tool when suspecting hardware failure Scheduled diagnostics: Schedule key diagnostics tests periodically Schedule all non-disruptive tests periodically Health-monitoring diagnostics: Key tests running by default Enable additional non-disruptive tests for specific functionalities enabled in your network: IPv6, MPLS, NAT I’ve spent quite some time on diagnostics because this is really a robust tool to detect failures and take auto-corrective actions during bootup time and runtime. This here is a summary of the main points I’ve talked about in terms of recommendations for using GOLD in your campus. Set bootup level to complete Use on-demand diagnostics for troubleshooting or as a predeployment tool The key health-monitoring tests already run in the background by default so you should not need to change the defaults. Now if you have more advanced features running in your network such as NAT, MPLS or IPv6 you may want to look into enabling tests for these functionalities in the background or schedule those a specific times. And if you want more information on diagnostics, please attend the NMS breakout session referenced here.

29 Generic Online Diagnostics Summary
Provides a common framework to configure, view and schedule diagnostics across Cisco IOS based switches and routers GOLD functional tests verify both the data path and control path of the device, can be run during bootup and during runtime When combined with other features such as Embedded Event Manger and Call Home the MTTR, mean time to repair, can be dramatically lowered via process automation

30 Smart Call Home

31 Catalyst 6500 Management Simplified Operation - Smart Call Home
Cisco TAC investigates problem and suggests remediation including shipping replacement parts if necessary Customer implements remediation and replaces faulty part (if applicable) Sends message to Cisco TAC with precise information and diagnostics Detects GOLD events and sends to Call Home GOLD runs diags, isolates fault and precise location

32 Interactive Technical Services Unique Catalyst 6500 Differentiator
What Is Smart Call Home? Interactive Technical Services Unique Catalyst 6500 Differentiator Customer Customer Notification Device and Message Reports Exceptions/Fault Analysis TAC Internet 3 Service Request Tracking System Smart Call Home represents a new value proposition for Cisco customers in having their devices connected to Cisco thereby opening the opportunity for interactive technical support. The interaction between Cisco and Customer is shown on this slide. Smart Call Home starts at the device with some features in the Catalyst 6500 due to be available in the 12.2(33)SXH release in mid CY07. These features are: Call Home GOLD Diagnostics Internal Note: The initial focus for support of Call Home and GOLD diagnostics are Cisco’s data center products. We are currently working on providing support for key products in this space including MDS9000. Other products will follow. Call home provides the capability for a customer to configure call home profiles that define: Destination Transport Events of interest For example, a customer might configure a profile to allow an individual to be paged at home via short text when a major diagnostic failure occurs. Or, all syslog events might be sent via HTTPS to a network management station. And, indeed, the case in which we are interested is certain events raising call home messages via HTTPS (or ) to Cisco TAC. This case is covered in the Call home feature by including a default call home profile for Cisco TAC. The events of interest are these. <Next build – includes box with messages received> Diagnostics. Environmental. High severity syslog. Inventory and configuration. Note: Any of these message types can be removed by customers. In addition, if customers choose to send configuration then we will remove sensitive details such as passwords. The Diagnostics in the Catalyst 6500, and now being built into a wide range of Cisco products, provide an online health test that essentially allows the device to ping its own components. On failure, a call home message will be sent. These diagnostics are referred to as Generic On-Line Diagnostics (GOLD). <Next build>. On receipt of a Call home message at Cisco the first step is entitlement processing. Customers need to have a standard Cisco SMARTnet support contract to be entitled to the Smart Call Home service. Internal Note: The Entitlement step is not shown in this slide. A backup slide shows more detail on entitlement if required. Next step is passing the message into the rules processor that will inspect the message and determine what next steps to take. <Next build>. If the situation is serious enough (module failure or fan failure for example) a service request will be raised direct with the Cisco TAC and routed to the correct team to handle the problem. Internal Note: We take special care not to raise service requests when they are not necessary. For example, GOLD diagnostics knows the difference between modules failing and being removed. <Next build>. If a service request is not raised then the message is stored along with the associated analysis of the problem for a customer or TAC engineer to use as part of their troubleshooting. <Next build>. Smart Call Home then has the option of proactively notifying the customer of problems which are likely to be emerging issues rather than issues the TAC can deal with (for example high temperature alarms independent of any fan failures or accumulating single bit memory errors). If Smart Call Home does not notify the customer then the customer or TAC engineer will be able to access all messages along with Cisco’s analysis of it on the Smart Call Home web application. Also available on the Smart Call Home web application are reports on the device hardware, software and configuration cross-referenced against any field notices, security alerts, end of life notifications of which we are aware specific to the hardware and software on the device. Internal note: These cross-references will be provided shortly after FCS. We are also working on the ability to provide proactive best practices based on the configuration of the device. Automated Diagnosis Capability Secure Transport* 1 Messages Received: Diagnostics Environmental Syslog Inventory and Configuration 2 Call Home Call Home DB *Ensures data protection HTTPS Encryption Certificate-based authentication IOS 12.2(33)SXH

33 Alerts with Pinpoint Accuracy
Health monitoring Scheduled, on-demand and boot-up diagnostics Isolates faults and precise location GOLD Diagnostics Detects GOLD events Sends event and diagnostic information to Call Home Embedded Event Manager Call Home Sends messages with precise diagnostics to Cisco IOS feature currently available on Catalyst 6500 and MDS 9000 More to come Call Home Alerts

34 Proactive Problem Identification Device Diagnostics and Rules Codification is Key
   37) TestErrorCounterMonitor > F           Error code > 1 (DIAG_FAILURE)           Total run count > 2484           Last test execution time ----> Feb :55:52           First test failure time -----> Jan :55:17           Last test failure time > Feb :55:52           Last test pass time > Jan :54:45           Total failure count > 2474           Consecutive failure count --->         Error Records as following.           ID -- Asic Identification           IN -- Asic Instance           PO -- Asic Port Number           RE -- Register Identification           RM -- Register Identification More           EG -- Error Group           DV -- Delta Value           CF -- Consecutive Failure           TF -- Total Failure              ID   IN    PO     RE     RM     DV   EG   CF      TF                      49    0   255     240     255        8    2    2483    2483             Smart Call Home Automated Diagnosis Capability ASIC # 49, Register #240 Failed 2483 consecutive times Indicates single bit ECC Errors detected and recovered Call Home Indicated developing unrecoverable failure. This is usually a problem related to improper grounding or excessive radiation emitted into the device. Make sure that the device is properly grounded and that neighboring devices are not emitting excessive radiation levels. Customer

35 The Smart Call Home Difference
Before P3 Service Request opened Cisco RP team checks IP Multicast configuration 45 min Problem narrowed to specific Cat 6500 ports Re-queued to LAN SW team 3.75 hrs Look into various known issues and bugs on WS-X6548-GE-TX. Find nothing. Request logs from customer 12 hrs Logs received and analyzed Identify online diagnostics failure for test TestL3VlanMet RMA created 25 hours Replacement part received (4 –hour replacement coverage) 29 hours S M T W TH F 1 2 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25 27 28 29 26 30 21 Minor hardware failure—undetected Customer’s Ops team discovers IP multicast configuration problem After Minor hardware failure—detected and Service Request automatically generated 12 min P3 SR opened due to GOLD failure. Diag. info attached Cisco LAN SW team takes ownership 12 min Informs customer of problem and confirms hardware fault 42 min RMA created and part dispatched. 1.2 hrs 5.5 hrs Replacement part received (4 –hour replacement coverage) Here is an example taken from a real Service Request opened in the TAC. It shows the kinds of benefits that can be expected with Smart Call Home. <Red Box 1>. The customer was attempting to configure a multicast application across his network including Catalyst 6500s and having trouble doing so. He therefore opened a Service Request with TAC. Being related to multicast IP, it was routing to the TAC team that handles IP routing protocols. <Red Box 2>. The TAC engineer first requested the customer’s configuration and was not able to find any problems. In further telephone conversations with the customer, the TAC engineer was able to narrow the problem to specific ports on the switch looked at the customers configuration. This meant that the problem was likely related to a problem with the switch rather than an IP multicast problem. Therefore, he re-queued the problem to his colleague in the LAN Switching team. <Red Box 3>. This engineer began by looking at known issues on the particular module in question and found nothing. He requested a log from the customer to determine if any other issues might have been occurring on the device that might give a clue to the problem. Indeed the log identified that a hardware failure had occurred and had been detected by the GOLD health monitoring diagnostics running on the device. He issued a replacement part and the problem was resolved. With Smart Call Home, the process would be altered somewhat. <Green Box 1>. The first difference is that the Service Request would be created at the time of the hardware failure. The customer may not be aware there is a problem on his network. The Service Request will be routed to the LAN Switching team directly and will include show diagnostics results and other output from the device important for troubleshooting. In addition to this output from the device, the TAC engineer would receive information about the test that has failed and proposed actions to troubleshoot the problem. <Green Box 2>. The TAC engineer would be able to quickly identify the likely hardware failure and call the customer to discuss some troubleshooting steps. This call might be the first the customer knows about this problem in his network. <Green Box 3>. A part would be dispatched and the problem resolved. Notice the significant reduction is time to resolve this problem. Also notice that the circumstances around the problem change. The customer did not spend any time troubleshooting his multicast IP problem and would not waste this time, potentially days, when he did come to configure the multicast application.

36 Convenient, Personalized Web Reports
Call Home messages, diagnostics and recommendations Inventory and configuration for all Call Home devices Security alerts, Field notices, and End-of-Life notices Web reports are available for both customers and TAC engineers at the Smart Call Home web application on the Cisco support site. These reports provide: Cisco’s analysis for any message received from customer devices irrespective of whether a Service Request was raised or notification sent. The analysis is performed over a 5 minute interval and correlates messages within that interval. An overall problem summary and recommendation is provided as well as individual analysis for each message received. Device inventory including software, modules, processors, configuration and features. This information is cross referenced against other information Cisco knows about the specific device including field notices, security advisories and End of Life notices. Internal note: These cross-references will be provided shortly after FCS. We are also working on the ability to provide proactive best practices based on the configuration of the device. It is planned to also provide configuration best practices and tighter functionality integration with other TAC Web tool such online service request creation, online bug reports, software download and so on.

37 Increased Value Proposition for Cisco Customers
Proactive, fast issue resolution Devices continually monitored with secure, connected service Real-time alerts for early detection of potential network problems Automatic, accurate fault diagnosis Higher Network Availability Smart Call Home Less time troubleshooting Automated Service Request (SR) creation Detailed diagnostics attached to SR Routed to correct TAC team Increased Operational Efficiency In summary, Smart Call Home offers proactive diagnostics and real-time alerts on select Cisco devices for higher network availability and increased operational efficiency. Smart Call Home is a new, secure connected service of SMARTnet. Themes: Higher network availability through proactive, fast issue resolution: Identify issues quickly with continuous monitoring, real-time, proactive alerts, and detailed diagnostics Anticipate some failures before they occur and provide notification to TAC or the customer to take preventative action Resolve critical problems faster with direct, automated access to experts at Cisco TAC Increased operational efficiency: Use staff resources more efficiently by reducing troubleshooting time Service Requests to Cisco TAC are generated automatically, routed to the appropriate support team, and include detailed diagnostic information to speed problem resolution Fast, web-based access to the information you need: Review all Call Home messages, diagnostics, and recommendations in one place Check Service Request status quickly View the most up-to-date inventory and configuration information for all Call Home devices Receive Field, PSIRT and End-of-Life notices proactively (shortly after FCS) Fast, web-based access to information Call Home messages, diagnostics and recommendations Inventory and configuration for all Call Home devices Security alerts, Field and End-of-life Notices Fast Access to Information

38 Global Load Balancing Protocol (GLBP)

39 First Hop Routing Protocols
Hot Standby Router Protocol (HSRP) Cisco informational RFC 2281 ( March 1998) Patented: US Patent 5,473,599, December 5, 1995 Virtual Router Redundancy Protocol (VRRP) IETF Standard RFC 2338 (April 1998) Now made obsolete by Gateway Load Balancing Protocol (GLBP) Cisco innovation, load sharing, patent pending IP routing redundancy is designed to allow for transparent fail-over at the first-hop IP router. Cisco Systems is committed to standards and has implemented VRRP in addition to HSRP. It is currently available in some releases with more being rolled out. Except for compatibility reasons, support of VRRP is not as critical for Cisco as it is for other vendors, because Cisco already offers full first-hop-router resilience with HSRP. Cisco recommends using HSRP because of its proven and superior convergence characteristics, except when local subnet interoperability is required with another vendor's VRRP implementation. Cisco will continue to enhance HSRP based on customer feedback and market direction. HSRP is a time-proven superior technology that has been deployed in thousands of service-provider and enterprise networks, and it is a key enabling feature of Cisco IOS software in the area of high availability. We’ll discuss GLBP in much more detail in the coming slides. It is a Cisco developed protocol based on HSRP and provides automatic load sharing. Cisco has a patent pending for the GLBP protocol,

40 Previous Multi-VLAN Load Balancing Methods
Layer-2 Mode Load Balancing VLAN A and B VLAN Trunk A&B Forward VLAN B Forward VLAN A Layer-3 Mode Load Balancing HSRP 1A HSRP 2s HSRP 1s HSRP 2A VLAN Trunk A&B Fwd VLAN A Block VLAN B VLAN Trunk A&B VLAN Trunk A&B Fwd VLAN B Block VLAN A Let’s summarize the two load balancing or load sharing methods at the access layer. Layer 2 load balancing: access switch supports two vlans, trunks to distribution layer, layer 2 trunk between switches. Spanning tree provides loop free network, root is at active layer 3 switch. Layer 3 load balancing: access switch supports two vlans, trunks to distribution layer, layer 3 link between switches. Both trunks are forwarding. Two HSRP groups, one active on one switch, one active on the other switch. Half of the devices use one virtual IP as the default gateway, half use the other. Layer 3 load balancing is preferred because it reduces dependency on spanning tree. But it does require more planning and definition. Or does it? What if we had a way…. VLAN A and B

41 Gateway Load Balancing Protocol
Cisco innovation (patent pending) Gateway Load Balancing Protocol (GLBP) goes beyond both HSRP and VRRP Previously, backup Layer-3 devices in the HSRP or VRRP group remained inactive, leaving underutilized capacity With GLBP, ALL L3 devices in the GLBP group actively participate in packet forwarding Without allocating additional subnets Without configuring multiple groups per subnet Without pre-directing end stations to specific gateways (vIP addresses) The intelligence is in the network No extra administrative burden Better return on investment Fully utilize resources, reduce potential for packet loss

42 Suppose a network with dual routers and links, with HSRP
GLBP Business Benefit Suppose a network with dual routers and links, with HSRP Active Standby GLBP cuts useable bandwidth costs in half $648 vs. $1295 GLBP allows use of all available paths But really only half the links in use, these are idle WAN or MAN 6 x T1 = Mbps T1 Costs $1000 $6000 / = $648/Mb Only using 4.632Mbps $1295/Mb

43 The Enterprise Premise Edge: Greater Efficiency at Same Cost
With Active/Standby Single buffer pool, single set of queues Higher risk of packet loss With GLBP Load is shared More available resources Buffer threshold Packet rate Packet loss Load balancing improves throughput & reduces potential of packet loss GLBP improvements over HSRP/VRRP Simplified provisioning Improved redundancy model Superior throughput

44 R1—AVG; R1, R2, R3 All Forward Traffic
How GLBP Works R1—AVG; R1, R2, R3 All Forward Traffic GLBP AVG/AVF,SVF GLBP AVF,SVF GLBP AVF,SVF IP: MAC: c vIP: IP: MAC: c vIP: vMAC: 0007.b IP: MAC: C78.9abc vMAC: 0007.b IP: MAC: cde.f123 vMAC: 0007.b IP: MAC: C78.9abc vIP: IP: MAC: cde.f123 vIP: R1 R2 R3 AVG Gateway Routers ARP Reply ARP Reply ARP Reply ARP ARP ARP A redundancy group will consist of one virtual IP address and multiple virtual MAC addresses Three main functions: Active Virtual Gateway responds to all ARP requests with the designated virtual MAC address according to the load balancing algorithm. Each member of the group monitors state of other member gateways. In the event of failure, a secondary virtual forwarder takes over for traffic destined to a virtual MAC impacted by the failure. Default load balancing algorithm AVG uses to assign virtual mac to clients is round-robin. Others are host-dependent and Weighted. Benefits: Simplified configuration, less administration, increased throughput in non-failure conditions. Clients CL1 CL2 CL3 IP: MAC: aaaa.aaaa.aa01 GW: ARP: B IP: MAC: aaaa.aaaa.aa01 GW: ARP: IP: MAC: aaaa.aaaa.aa02 GW: ARP: IP: MAC: aaaa.aaaa.aa02 GW: ARP: B IP: MAC: aaaa.aaaa.aa03 GW: ARP: IP: MAC: aaaa.aaaa.aa03 GW: ARP: B

45 R1—AVG; R1, R2, R3 All Forward Traffic
How GLBP Works R1—AVG; R1, R2, R3 All Forward Traffic GLBP AVG/AVF,SVF GLBP AVF,SVF GLBP AVF,SVF IP: MAC: c vIP: vMAC: 0007.b IP: MAC: C78.9abc vIP: vMAC: 0007.b IP: MAC: cde.f123 vIP: vMAC: 0007.b R1 R2 R3 AVG Gateway Routers A redundancy group will consist of one virtual IP address and multiple virtual MAC addresses Three main functions: Active Virtual Gateway responds to all ARP requests with the designated virtual MAC address according to the load balancing algorithm. Each member of the group monitors state of other member gateways. In the event of failure, a secondary virtual forwarder takes over for traffic destined to a virtual MAC impacted by the failure. Default load balancing algorithm AVG uses to assign virtual mac to clients is round-robin. Others are host-dependent and Weighted. Benefits: Simplified configuration, less administration, increased throughput in non-failure conditions. Clients CL1 CL2 CL3 IP: MAC: aaaa.aaaa.aa01 GW: ARP: B IP: MAC: aaaa.aaaa.aa02 GW: ARP: B IP: MAC: aaaa.aaaa.aa03 GW: ARP: B

46 GLBP – Protocol Details
‘Hello’ messages are exchanged between group members AVG election by priority vMAC distribution, learning of VF instances GLBP will use the following multicast destination for packets sent to all GLBP group members: , UDP port 3222 Virtual MAC addresses will be of the form: 0007.b4yy.yyyy where yy.yyyy equals the lower 24 bits; these bits consist of 6 zero bits, 10 bits that correspond to the GLBP group number, and 8 bits that correspond to the virtual forwarder number 0007.b : last 24 bits = = GLBP group 1, forwarder 2 Protocol allows for 1024 groups and 255 forwarders Number of forwarders are capped at 4 Hardware restrictions limit actual number of groups and forwarders

47 GLBP Configuration Rules
Load balancing operates on a per-host basis All connections for a given host will use the same gateway Maximum of 4 MAC addresses per GLBP Group Load balancing algorithm, 3 types: Round-robin Each virtual forwarder MAC takes turns Weighted Directed load determined by advertised weighting factor Host-dependent Ensures that each host is always given the same vMAC If no load balance algorithm is specified, default is round-robin MD5 authentication security (Releases 12.3(2)T and 12.2(18)S)) Here are some configuration rules to keep in mind. As we’ve seen, since load balancing is based on the MAC addresses provided to the end stations in ARP responses, it operates on a per-host basis. All IP connections from a given device will take the same path for outbound traffic. Inbound traffic may take either path depending on routing. A maximum of 4 virtual MAC addresses or 4 gateways is supported. The number of groups that can be configured will be platform dependent and based on the the number of MAC addresses that can be reserved. There are three load balancing algorithms, round-robin, weighted, and host-dependent with round-robin being the default. Round-robin: Each Virtual Forwarder MAC address takes turns being included in address resolution replies for the virtual IP address. Round-robin load balancing is RECOMMENDED for situations where there are a small number of end hosts. Weighted: The amount of load directed to an Active Virtual Forwarder depends on the weighting value advertised by the gateway containing that Active Virtual Forwarder. Host-dependent: The MAC address of a host is used to determine which Virtual Forwarder MAC address that host is directed towards. This ensures that a host will be guaranteed to use the same virtual MAC address as long as that virtual MAC address is participating in the GLBP group. Host dependant load balancing is required for IP redundancy. IP redundancy is used for stateful fail-over, and this requires each host to be returned the same virtual MAC address each time it ARP's for the virtual IP address.

48 GLBP Configuration Example
! interface FastEthernet2/0 ip address duplex full glbp 1 ip glbp 1 priority 105 glbp 1 authentication text magicword glbp 1 weighting 100 lower 95 glbp 1 weighting track 10 decrement 10 glbp 1 forwarder preempt delay minimum 0 Here we see a sample configuration. As you see, glbp is configured at the interface. This config defines a virtual IP address of Note the address is part of the subnet defined on the interface. Priority is optional and has to do with setting a particular router to become the AVG. The highest priority router will negotiate to become the AVG for the group. Default is 100. A simple text authentication method is provided for security. Each configured member gateway must have a matching text string, in this case “magicword”. This example also defines an object to be tracked. The forwarder preempt delay is modified to have no delay ( = 0 ). Default is 30 seconds. Each member gateway in the group will have a similar configuration.

49 GLBP Implementation Issues
No SNMP yet Four entries per GLBP group will be used in the MAC address filter of Ethernet interfaces configured with GLBP groups No “use-bia” allowed Cisco Catalyst 6500 Series MSFC2 MAC filter limits number of GLBP groups to 1 – However, group may be reused on multiple VLANs Only use GLBP for layer-2 switched environments Designed for L2-L3 edge boundary Be careful with other IP services, including NAT, IPsec, Mobile IP, and High Availability Services that use internal IP redundancy API in Cisco IOS Software do not currently support GLBP GLBP is not yet SSO-aware A few more items…. No SNMP support yet. As I said, there may be some platform issues with respect to the hardware. Some platforms may only allow a limited number of groups to be defined. Only a simple text string security mechanism has been implemented. GLBP is meant to be used in devices attached to layer-2 switches. Downstream devices will not be aware of packets sent from multiple gateways with common IP addresses. Be careful if you try to use other IP services such as NAT, Stateful NAT, IPsec, Mobile IP, and High Availability environments. Compatibility testing is just beginning for use of these other features in conjunction with GLBP. Watch for more information in the coming weeks on this.

50 Cisco Catalyst 6500 Series and Cisco 7600 Series GLBP Specifics
GLBP “reserves” 4 MAC filter entries The number of forwarders in the group is limited to 4* Active Virtual Gateway will ‘allocate’ these to GLBP group members (Virtual Forwarders) There is a restriction on GLBP group number for the MSFC2/PFC2 – Only a single group may be defined The single group may be reused on all VLAN Sup720 supports both plain text & MD5 auth; Sup2 plain text only HSRP & GLBP can co-exist in Sup720 but not in Sup2 GLBP Availability: Cisco IOS Software Release Switching Product Group/Forwarder Limits 12.2(17d)SXA and later Cisco Catalyst 6500 SUP720/MSFC3 1024 / 4 12.2(17d)SXB and later Cisco Catalyst 6500 SUP2/MSFC2, C7600 SUP2/MSFC2 1 / 4 * Note: 1024 group limit is an arbitrary cap, the protocol design actually allows for 4096; as is the forwarder limit of 4 – the design could allow for up to 16. Customers have not requested the additional capacity.

51

52 Generic Online diagnostics GOLD Test Suite
Bootup Diagnostics forwarding Engine Learning Tests (Sup & DFC) L2 Tests (Channel, BPDU, Capture) L3 Tests (IPv4, IPv6, MPLS) Span and Multicast Tests CAM Lookup Tests (FIB, NetFlow, QoS CAM) Port Loopback Test (all cards) Fabric Snake Tests Health Monitoring Diagnostics SP-RP Inband Ping Test (Sup’s SP/RP, EARL(L2&L3), RW engine) Fabric Channel Health Test (Fabric enabled line cards) MacNotification Test (DFC line cards) Non Disruptive Loopback Test Scratch Registers Test (PLD & ASICs) On-Demand Diagnostics Exhaustive Memory Test Exhaustive TCAM Search Test Stress Testing All bootup and health monitoring tests can be run on-demand Scheduled Diagnostics All bootup and health monitoring tests can be scheduled Scheduled Switch-over Functional Testing combined with components monitoring to detect fault in passive components (connector, solder joint etc.) and active components (ASICs, PLDs etc.)


Download ppt "Cisco Catalyst 6500 IOS Update"

Similar presentations


Ads by Google