Home » Bury the hatchet » Cisco and Nonstop Forwarding

Cisco and Nonstop Forwarding

Nonstop Forwarding is a feature of many features in the Cisco High Availability portfolio, we shall be covering most of it later.

To check Cisco High Availability portfolio:

http://www.cisco.com/en/US/products/ps6550/products_ios_technology_home.html

Use Cisco Feature Navigator to find information about platform support and software image support:

http://tools.cisco.com/ITDIT/CFN/jsp/index.jsp

Cisco Nonstop Forwarding (NSF – AKA Graceful Restart) with Stateful Switchover (SSO – SSO is a prerequisite for NSF) is a Cisco innovation for platforms with dual route processors (Cisco 7304, 7500, ASR1000, 4500, 6500, 7600, 10000, 12000 and CRS), allowing a NSF Capable router which has experienced a hardware or software failure of an active route processor, to maintain data link layer connections and continue forwarding packets during the switchover to the Standby route processor.

The forwarding can continue despite the loss of routing protocols peering sessions with other peering routers. The now active route processor (which was the standby) will initially have no active routing session(s) with any peers (no neighbors, link-state database, BGP table …), however it has an identical FIB and Adjacency information synced from the former Active route processor. Routing information is recovered dynamically, in the background, while packet forwarding proceeds uninterrupted using the FIB and Adjacency information synced from the former Active router processor.

A Cisco router equipped with dual route processors can maintain Layer 2 data link connections and up-to-date “next-hop” information (FIB and Adjacency tables) to continue forwarding packets in the event of a route processor switchover until the routing protocols recover – In other words each protocol depends on CEF to continue forwarding packets during switchover while the routing protocols rebuild the Routing Information Base (RIB) tables. Once the routing protocols have converged, CEF updates the FIB table and removes stale route entries. CEF then updates the line cards with the new FIB information.

Baring in mind that the NSF capable router peers (should be NSF aware for NSF to operate) should not sense the failure until the now active RP (which was the standby RP before the switchover) can reinitiate the routing session from its side informing its peers that there has been a switchover (sending the first hello or Open message in case of BGP); that is, the hold-time or the dead-interval should accommodate (be more than) the time required by the standby-RP when switching to be active to be able to reinitiate the routing session (generally speaking the time required for it to be able to send the first hello packet or Open message in case of BGP).

The last point contradicts with the fast convergence logic in the operation prospective, since fast convergence is all about reducing the time of detecting a failure, and thus using both features simultaneously should be well analyzed and configured. In other words, NSF can be used to improve network performance in a totally different way than traditional convergence, it Simply stops the reporting of neighbor failure, avoiding the need of re-convergence in the first place, while keep forwarding the packets until the neighbor resets its control plane (it is now clear that we are talking about platforms leveraging the use of redundant control plane processors with the ability to switchover).

Using NSF you can virtually reduce network convergence to zero in case of control plane failures, but this implies the need for routers participating in this operation to be either NSF capable or aware, plus another critical implication which is that the failure must be recovered within the hold down timer limit of the protocol, to coexist with the appropriate convergence setup to offer the best overall network performance.

A final thing to remember, NSF/SSO benefit is not just nonstop forwarding, but also reducing route flaps, which is a significant factor in large scale networks, since route flapping results in all network routers converging after each route flaps, but with NSF/SSO no route flapping is reported – Since the NSF capable router(s) peers don’t report the NSF capable neighbor down to the rest of the network.

I hope that I’ve been informative.

BR,

Mohammed Mahmoud.

3 comments

  1. Excellent article. Just one thing in terms of terminology. It seems that vendors use the term nsf and graceful restart interchangeably. NSF refers to the routers core function. The nsf “mechanism(feature)” is a result of separating the control and data plane. In that respect, it is not very special.

    Graceful restart is also called NSF but it is a different thing. GR interacts with peer network devices. As you stated, devices must be “nsf-aware”. A better term would be, the peers must be “gr-aware”. That is, when a router goes down, peer’s wait a period of time before trying to re converge the network as stated in your article.

    For more info on this, see: https://www.youtube.com/watch?v=CGbkc2l4ESE

Leave a Reply

Your email address will not be published. Required fields are marked *