ON BGP MUTATIONS ================ The collapse of the Internet has already been predicted lots of times. Some researchers and practitioners have damned BGP, and proposals for finding a replacement for BGP are resounding throughout the community. But before proposing changes to existing protocols, we should understand the origins of todays problems. We have to grasp the design decisions, the interactions, and the scalability limitations of the current implementations. Regarding BGP this in-depth understanding is clearly not present. In the following I like to envisage three areas in which BGP may/should evolve in the next few years. Those areas can be viewed as short-, mid-, and long-term goals. 1. Vendor implementation issues (or: convergence and scalability questions) Convergence times in the Internet are still in the order of several minutes. Regarding the critical importance and compared to telephone networks, this is no longer acceptable! But the protocol does not have to be changed to improve convergence. The limiting factors are vendor specific implementations details, settings of timers and parameters as well as overloaded routers [see Appendix A]. Just to pick one example, consider the propagation of updates in I-BGP through a series of route reflectors (RR): Updates will be delayed by approximately 10 seconds per RR by MRAI. Changing this timer setting or changing the network design (reduced number of cascaded RR that the update has to pass) will speed up convergence without protocol modifications. This example leads to the second area: 2. Human-factor issues (or: misconfiguration questions) Network design as well as router configuration is not a trivial task. (e.g., [Caldwell03]). Therefore human error in router configuration and network design happens every day [Mahajan02]. Various homegrown tools and approaches exists (e.g., see presentation at operators forums such as [NANOG]). Still research needs to focus more on solutions to minimize the error potential. Here tools and accurate databases are desperately needed, but no changes to the protocols are necessary to minimize human errors. On the other hand it is known that certain configuration mistakes can lead to BGP oscillations (e.g. [RFC3345]). The current approach is a patchwork which fixes bugs when they occur. This is not acceptable and we need some protocol enhancements, which leads to the third area: 3. Protocol-design issues (or: protocol divergence, inter-domain TE, etc. questions) One beloved feature of BGP is that it is completely configurable through policies, but Tim Griffin has shown that todays existing MED oscillations are just the tip of the iceberg and that BGP can lead to diverging states on a much larger scale (e.g., [Griffin99]). There are further demands from the market that can't be satisfied with our current version of BGP. This includes inter-domain equal-cost-multipath, "online" inter-domain traffic engineering (a la routeScience), etc... All this will not be possible as long as the best path decision process of a router selects only one best route. Furthermore additional information about causes and origins of routing instabilities would be helpful for operators to locate and debug routing problems. Even though the list above does not claim to be exhaustive, it is clear that some enhancement to BGP will be unavoidable! How will BGP evolve? Quite logically, vendors are mainly implementing those features that the market is supposed to buy (e.g., MPLS/VPNs). From my perspective, all three areas mentioned above, are not very attractive to vendors (e.g. low cost-benefit ratio), but important for the future of the Internet. That is the reason why those areas need support from research to evolve. To approach those problems, we need an in-depth understand of protocol details, router limitations and interactions between protocols as well as propagation patterns through the topology. Research should start with answering questions from the following categories: 1. Protocol analysis Identify the root causes and the location of triggering events. Investigate interactions between routing protocols and topology. For example questions here could be: How to identify the AS which originated an update? How many updates are due to what kind of events? 2. Equipment scalability tests Understand the scaling limitations of todays equipment before judging about the deployment of additional features. For example questions here could be: How long does an update spend inside a router (under certain load conditions)? How much more load can inter-domain traffic engineering or a lower MRAI value impose on a router? 3. Simulation Use network simulation to understand how routing updates traverses the network. Investigate interactions of various timers, of policies, between IGPs and BGP, etc. Example questions here could be: How to implement BGP in a way that the number of "dispensable" updates (caused by interconnectivity and timers) can be limited? ... BGP is a protocol which evolved in over 15 years now. The most important part is that network operators have full control over all settings and their route distributions. My conclusion, regarding the future of BGP, is that a lot of problems that we have with todays routing, are fixable within BGP and should be fixed soon. Furthermore that enrichment (e.g., optional add-ons) to BGP are not only necessary, but unavoidable! On the other hand a replacement protocol will have a hard stand on the market. Therefore "mutations" are possible, but a replacement will be crushed by "natural selection". This is the part of evolution theory that BGP is subjected to - from my point of view. References ---------- [Griffin99] T. G. Griffin, and G. Wilfong, "An analysis of BGP convergence properties," in Proc. ACM SIGCOMM, September 1999. [RFC3345] D. McPherson, V. Gill, D. Walton, and A. Retana, "Border Gateway Protocol (BGP) Persistent Route Oscillation Condition", Request for Comments 3345, August 2002. [Mahajan02] R. Mahajan, D. Wetherall, and Tom Anderson, "Understanding BGP Misconfiguration", ACM SIGCOMM, August 2002 [NANOG] The North American Network Operators' Group http://www.nanog.org/ [Caldwell03] Don Caldwell, Anna Gilbert, Joel Gottlieb, Albert Greenberg, Gisli Hjalmtysson, and Jennifer Rexford, "The cutting EDGE of IP router configuration," unpublished report, July 2003. ------------------------------------------------------------------------ Appendix A: Example, "the MRAI fight" ------------------------------------- A critical factor in BGP update distribution is the Minimum Route Advertisement Interval (MRAI) and the way it is implemented in router software. The basic idea behind this timer is, to collect first all updates arriving from different peers and pass only one "best" update on. The RFC suggests that after one update for one prefix was send to one peer, there should be a (jittered) delay of 30 seconds before another update for the same prefix can be send to the same peer. Indeed this limits the number of BGP messages that need to be exchanged. We note that certain vendor specific implementations differs a lot from the recommendation in the RFC and therefore introduce a significant different propagation picture. Here are two examples: From our current understanding of Cisco's MRAI implementation there are two major differences with regards to the RFC. The first difference is, that the timer is implemented on a per peer basis instead of per prefix basis. Scalability reasons does not allow an implementation per peer and prefix, but therefore almost ALL outgoing updates will be delayed - not just two consecutive updates (close in time and belonging to one prefix)! That means, that each and every update will be queued and only propagated when the timer expires. The second difference is, that MRAI is holding back withdraws as well as announcements. This is a major cause of the observed BGP path exploration phenomena. Our current understanding is that MRAI on Junipers is called, "out-delay" [https://www.juniper.net/techpubs/software/junos/junos57/ swconfig57-routing/html/bgp-summary32.html] and is disabled by default. That means, Juniper is not holding back any BGP update messages. Indeed this speeds up convergence, but at the risk that much more updates will be send - which in turn triggers more damping. The trade-off in this fight is between faster propagation and more protocol messages. It is clear that in todays Internet more protocol messages would lead to more damping, which doesn't improve convergence. Even in a fictional Internet without damping, more protocol messages would burn more CPU time. Therefore future research has to show whether this is desirable (consider todays CPU speeds), or not (because of scalability considerations).