Skip to content

Latest commit

 

History

History
77 lines (44 loc) · 9.77 KB

problem-statement.md

File metadata and controls

77 lines (44 loc) · 9.77 KB

Introduction

BGP was originally conceived by Yakov Rekhter and Kirk Lougheed in 1989 while sitting at a bar at the twelfth IETF meeting in Austin Texas. Yakov, in later conversations, continued to be surprised at the longevity of the original protocol, currently in its fourth version. The original design was simple and to the point, including:

  • Peering would be over existing IP connections, as interdomain protocols are designed as an overlay
  • While loop-freeness must be a primary property, local provider policies must be clearly expressible and implementable
  • Policy must prefer the desires of the provider who is currently forwarding the packet over peers, and peers over more distance autonomous systems
  • Peering and route advertisement would take place within a trust-rich environment; peering organizations would have written contracts enforced using filters, and with the ultimate fallback of de-peering
  • The protocol should be easily extensible to support new use cases as needed

In some original BGP deploymnents (for a short period of time), externally learned routes were redistributed into an IGP, and then back into BGP at the other edge of the provider. Internal connectivity was available, but only at a basic level. Over time several capabilities have been added to BGP to support scaling and operational requirements.

First, richer internal connectivity options were added to support much larger operator networks, and then to move towards a more clearly divided underlay/overlay model. The underlay, or IGP, would be used to provide reachability to next hops (or infrastructure routes), while the overlay, BGP, would carry routing information for destinations outside the local autonomous system (or customer routes). This initially included confederations, which were later replaced by route reflectors (and, for internet exchanges, route servers).

Second, additions to the suite of internet protocols (primarily IPv6 and MPLS) resulted in BGP being modified to support address families.

Third, increasing requirements to steer traffic along specific paths to increase quality of experience (traffic engineering) and provisioning of virtual Ethernet overlays led to a series of extensions supporting MPLS and SR.

More recently, many other capabilities have been added to BGP to support BGP enabled services, which might colloquially be called "BESS use cases," including:

  • Large scale overlay layer 2 overlay support, such as eVPN
  • BGP based multicast
  • Pseudowires
  • Link state operation using BGP flooding

Beyond these, current proposals designed to create modes of operation and features more suited to using BGP as an interior gateway protocol, especially for large-scale data center fabrics, are either being considered or adopted. Some implementations have also used characteristics of BGP's operation, such as "magic AS numbers," to implement features on data center fabrics.

These additional features all support well-known use cases, and so add value to the protocol. There is no doubt that each of these use cases is valuable.

However, these extensions also add complexity to the protocol, and often run counter to the original intent of BGP--an exterior gateway protocol designed to carry policy-focused routing between independent network operators across the public Internet. The result is a protocol of increasingly byzantine complexity full of non-orthogonal "knobs" that partially contradict each other and hence more and more difficult to understand, more and more difficult to implement, and ultimately difficult to manage.

The sections below consider some of the problems with the current generation of BGP (some of which are related to the inclusion of deeper support for services, others of which are a result of BGP's basic design).

BGP Code Complexity

In the mid-1990’s one major implementation of BGP consisted of a grand total of around 40,000 lines of code—including initial support for address families and overlay virtual networks. In the last thirty years, BGP implementations have exploded, with current implementations easily surpassing 250,000 lines of code. While this increasing code base supports a much wider set of features, it is almost impossible for any single developer to understand the entire scope, purpose, and use case for every feature supported in a code base of this scale.

A codebase of this size, combined with the many hundreds (thousands) of use cases for which BGP can be applied—and all the variations of these use cases—make it almost impossible to fully test a complete BGP implementation for correctness. There are bound to be latent defects in a code base of this girth that cannot be discovered before having an operational impact, perhaps even causing operator- or internet-scale failures.

BGP Deployment Complexity

The documentation for any implementation of BGP can run to thousands of pages; operators must know what they are trying to get done to ask the right questions before they can even begin to search documentation for information on how to accomplish a given task. It is impossible, at this point, for an individual engineer to learn BGP in any meaningful way from documentation alone.

BGP deployment is conceivably completely different for each broad class of use cases. Inter-provider communication requires a different configuration model than BGP as an IGP in a data center underlay, or BGP supporting an overlay of virtual topologies, or providing cloud connectivity, or emulating L2 substrate, etc. Very few engineers have wide enough experience to encounter BGP in all these situations; saying a particular person “knows BGP” is no longer a useful description of the operator’s knowledge.

Using a default configuration applicable to a class of use cases in another can cause network outages.

BGP is Unstable

It has been widely documented that the Default-Free Zone (DFZ), or the global Internet routing table does not converge. See, for instance, Geoff Huston’s measurements in this post: https://www.potaroo.net/ispcol/2022-01/bgpupd2021.html.

There is a widespread impression that the constant state of convergence in the DFZ is directly attributable to the global number of routes. There is no reason, however, that a routing system with a lot of routes should not converge. Rather, the instability observed in the global routing table is because the rate of change in the global routing table is faster than BGP converges, combined with several BGP-specific policy interactions as well as non uniform tie-breaking across different configurations and implementations.

BGP converges for reasons other than changes in reachability or path, adding to the frequency of convergence events in the DFZ. For instance, BGP sends updates when a policy changes, even if the policy change does not impact the path to the reachable destination. Further there are situations where BGP can build a “wedgie,” as described in RFC4264, and can also persistently oscillate between multiple paths, as described in RFC7964.

(see also https://rule11.tech/bgp-persistent-oscillation/ for an explanation of persistent oscillations in BGP).

This is largely the legacy of BGP breaking the total ordering of paths by introduction of MED. This trend continues with increasing numbers of "optional" path tie-breakers introduced by RFCs and implementations.

While various conditions cause BGP to converge more often than would normally be required, BGP is also notoriously slow to converge. A general rule of thumb is to subtract the length of the current best AS path from the length of the next possible best path (in the case of a withdraw, the maximum AS path length in the network), and multiply this number by the Minimum Route Advertisement Interval. However, this is not necessarily BGP's fault as a protocol, a diffused computation suffers inherently from processing delays at every node. Also, to enhance stability at scale, it is often necessary to use "state compression" or hysteresis in advertisements.

Given these parameters, BGP normally requires tens of seconds to converge on route changes in the DFZ. These relatively long convergence times, combined with many different reasons for route changes, leads to a situation where constant churn is the normal mode of operation.

Constantly churning networks are harder to understand, secure, and troubleshoot. A constantly converging network is always triggering timers and processes designed to increase stability during rapid changes.

BGP is Insecure

As noted in the history of BGP above, the protocol was designed to operate in a trust-rich environment. Protocol designers and implementors assumed policy would primarily be used to support legal contracts made outside the BGP protocol itself. Because of this, BGP is difficult, if not impossible, to secure in meaningful ways.

One of the primary problems with securing BGP is the protocol has no explicit way to express negative policy (such as “this destination is no longer reachable via this path,” or even “you should not use this path to reach this destination”). Because of this, virtually any mechanism designed to secure BGP by ensuring operators are following correct processes, or BGP is operating according to the protocol specifications (such as BGPSEC), will necessarily have “holes” or open attack surfaces that cannot be closed.

Other problems with securing BGP include:

  • Any security system designed to operate on a per-update basis is likely to open many new attack surfaces in the form of denial of service attacks
  • Any security system that increases the time required to process an update will necessarily slow convergence, increasing the overall instability of the system
  • Any security system that increases development and deployment complexity will likely increase the number of failures induced by human error, and make the full scope of BGP deployment and implementation impossible to readily test