Thursday, August 18, 2016

How to properly enable NSF / Graceful-Restart in OSPF between IOS and NX-OS

Overview



OSPF routes get withdrawn on all neighbors (NXOS 7K) when a VSS supervisor (IOS 6807) switch over is initiated by any of the following methods: 
  1. Pulling active supervisor out
  2. redundancy force-switchover
  3. Rebooting chassis with active supervisory. 

Hardware / Software

Cisco 6807

  • 2x Cisco 6807 
  • 2x Sup6T (in each chassis)
  • Firmware: 15.3(1)SY
  • VSS

Nexus 7K

  • 3x Cisco Nexus 7009
  • 2x SUP1 (in each chassis)
  • Firmware: 6.2(10)

Topology Overview

Two Cisco 6807's in VSS mode are peering with three different NS-OX 7K's via routed point-to-point port-channels from both VSS members to a single 7K.  

Resolution

By default NX-OS does not support nsf cisco mode.   The IOS device must be configured with nsf ietf.  See example below:

router ospf 1
 router-id 192.168.0.1
 nsf ietf
 redistribute static subnets route-map STATIC_TO_OSPF
 passive-interface default
 no passive-interface Port-channel9
 no passive-interface Port-channel10
 no passive-interface Port-channel11
 no passive-interface Port-channel12
 no passive-interface Vlan899

Reference documents

Thursday, March 24, 2016

High CPU and Latency to VSS 4500X Control Plane

While troubleshooting a high latency issue on a Cisco 4500X we determined the problem to be a 30Mb/sec stream of UDP syslog traffic streaming into a host that was shutdown and therefor the ARP entry was removed:


4500x-switch#show ip arp vrf yellow-zone 192.168.112.2
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.112.2           0   Incomplete      ARPA



We saw the CPU go high:

4500x-switch#show processes cpu sorted
Core 0: CPU utilization for five seconds: 99%; one minute: 87%;  five minutes: 49%
Core 1: CPU utilization for five seconds: 1%; one minute: 13%;  five minutes: 51%
PID    Runtime(ms) Invoked  uSecs  5Sec     1Min     5Min     TTY   Process
8609   1518439     20262550 392    50.56    50.40    50.32    0     iosd



Cisco says, when there is no ARP entry:


When CEF cannot locate a valid adjacency for a destination prefix, it punts the packets to the CPU for ARP resolution and, in turn, for completion of the adjacency. In rare cases, the adjacency persists in an incomplete state. For example, if the ARP table already lists a particular host, then punting it to the process level does not trigger an ARP.



This can be determined by looking for L3 Glean:

4500x-switch#show platform cpu packet statistics | inc Glean
L3 Glean                    2283852361     12913     14103     11978       8839
L3 Glean                     767303902      7732      8469      7606       6976


If the L3 Glean is high, the packets are getting punted to the CPU for processing.  As of 2016/03/24 we are checking to see if this is a bug.  This can be used as a DoS attack avenue.   

Followers

Contributors