BGP performance tuning - Convergence, Stability, Scalability and NSF (Part 1)
It is a very critical matter for a network architect (the same goes for a network operator but with a different prospective) to understand the inside out of tuning the routing protocols performance, in order to be able to conduct an appealing and effective low level design for small to large scale networks.
We have been discussing IGPs performance tuning in other posts, in this thread I'll try to illustrate a simple description for all the tools/parameters affecting BGP performance (convergence, stability, scalability, and non-stop forwarding), with the defaults values and the recommended values for each.
Before starting I'd like to forward a thank you words to Jeff Doyle, Ivan Pepelnjak and Russ White, without these people and their books and blogs it would have taken us decades to gather our knowledge.
Convergence is all about having all the routers in the network agreeing on a consistent routing scheme and start forwarding network traffic based on it, free of routing loops or black holes, just after any network change, let this change be a link/node failure or a new link/node introduced to service, baring in mind a couple of critical facts, first trying to eliminate the network meltdown phenomenon (in simple words this is the state of not being able to converge due to being very fast in reporting network changes), and the second is trying not to over tax the routers processors.
I believe that it is fair to say that the default routing protocols parameters can achieve an acceptable convergence time for small low speed networks, but with today's multigigs per second links and large scale networks this won't be quite enough. This takes us to the era of fast convergence or practically the sub-second convergence era.
The CPU utilization is not a significant factor to consider anymore with today’s high-end platforms with dedicated control planes (powerful processors only used for control plane) and data/forwarding planes (most commonly ASICs-based on the high-end platforms). So lets not let old myths control our vision.
It is important to always remember that with BGP, fast convergence is not always the main target like it is with IGPs, this is due to the fact that stability is the most significant factor in mostly any BGP implementation specially for internet routing, frequent session resets are not desirable as they will introduce network instabilities. However in some implementations such as MPLS VPNs, BGP convergence has became an item to be strongly considered.
Well, I'll start in this post by stating the available BGP tools and parameters that can be used to tune its performance, and in latter posts I'll be covering these tools and parameters in details.
The available tools and parameters are:
BGP network timers (Keepalive and Holdtime) - For session maintenance (polling method).
Advertisement Interval - For controlling minimum interval between sending BGP routing updates.
Initial delay for sending updates - For initially delaying sending BGP routing updates for the sake of enhanced convergence.
Background BGP scanner - Responsible for BGP housekeeping by scanning both the BGP RIB and the IP RIB and cleaning things out - Anyway BGP Next-Hop address tracking feature changed things out as we shall see later.
ConnectRetry timer - The time to wait before attempting to reconnect to the BGP neighbor after failing to connect.
Fast Fall-over - BGP session failure fast detection (event driven method).
Bidirectional Forwarding Detection (BFD) - Fast failure detection for faster reconvergence time (polling method - a form of fast hellos).
BGP Route Dampening - To minimize the effect of a flapping route on the BGP table and the whole network.
Graceful Restart (NSF) - To minimize the effect of control plane failures.
Route Reflectors and Confederations - To scale beyond the iBGP full mesh liability.
Peer Session/Policy Templates - For scaling the configuration of BGP session parameters and inbound/outbound policies.
I hope that I've been informative, and as agreed we are going to cover each tool/parameter in details in later posts.