Workshop on Internet Routing Evolution and Design (WIRED)

October 7-8, 2003
Timberline Lodge, Mount Hood, Oregon, USA

Position statement of

Olaf Maennel

(TU Munich)






   ON BGP MUTATIONS
   ================


The collapse of the Internet has already been predicted lots of times.
Some researchers and practitioners have damned BGP, and proposals for
finding a replacement for BGP are resounding throughout the community. 

But before proposing changes to existing protocols, we should understand
the origins of todays problems. We have to grasp the design decisions,
the interactions, and the scalability limitations of the current
implementations. Regarding BGP this in-depth understanding is clearly
not present.

In the following I like to envisage three areas in which BGP may/should
evolve in the next few years. Those areas can be viewed as short-, mid-,
and long-term goals. 


      1. Vendor implementation issues
         (or: convergence and scalability questions)
   
   Convergence times in the Internet are still in the order of several
   minutes. Regarding the critical importance and compared to telephone
   networks, this is no longer acceptable!
 
   But the protocol does not have to be changed to improve convergence.
   The limiting factors are vendor specific implementations details,
   settings of timers and parameters as well as overloaded routers 
   [see Appendix A]. 
   
   Just to pick one example, consider the propagation of updates in
   I-BGP through a series of route reflectors (RR): Updates will be
   delayed by approximately 10 seconds per RR by MRAI. Changing this
   timer setting or changing the network design (reduced number of
   cascaded RR that the update has to pass) will speed up convergence
   without protocol modifications. This example leads to the second
   area:
   
   
      2. Human-factor issues
         (or: misconfiguration questions)
  
   Network design as well as router configuration is not a trivial task.
   (e.g., [Caldwell03]). Therefore human error in router configuration
   and network design happens every day [Mahajan02].
   
   Various homegrown tools and approaches exists (e.g., see presentation
   at operators forums such as [NANOG]). Still research needs to focus
   more on solutions to minimize the error potential. 

   Here tools and accurate databases are desperately needed, but no
   changes to the protocols are necessary to minimize human errors. 

   On the other hand it is known that certain configuration mistakes can
   lead to BGP oscillations (e.g. [RFC3345]). The current approach is a
   patchwork which fixes bugs when they occur. This is not acceptable
   and we need some protocol enhancements, which leads to the third
   area:
   
   
      3. Protocol-design issues
         (or: protocol divergence, inter-domain TE, etc. questions)

   One beloved feature of BGP is that it is completely configurable
   through policies, but Tim Griffin has shown that todays existing MED
   oscillations are just the tip of the iceberg and that BGP can lead to
   diverging states on a much larger scale (e.g., [Griffin99]). 

   There are further demands from the market that can't be satisfied
   with our current version of BGP. This includes inter-domain
   equal-cost-multipath, "online" inter-domain traffic engineering (a la
   routeScience), etc... All this will not be possible as long as the
   best path decision process of a router selects only one best route. 
   
   Furthermore additional information about causes and origins of
   routing instabilities would be helpful for operators to locate and
   debug routing problems. 
 
   Even though the list above does not claim to be exhaustive, it is
   clear that some enhancement to BGP will be unavoidable!


How will BGP evolve?

Quite logically, vendors are mainly implementing those features that the
market is supposed to buy (e.g., MPLS/VPNs). From my perspective, all
three areas mentioned above, are not very attractive to vendors (e.g.
low cost-benefit ratio), but important for the future of the Internet.
That is the reason why those areas need support from research to evolve. 

To approach those problems, we need an in-depth understand of protocol
details, router limitations and interactions between protocols as well
as propagation patterns through the topology. Research should start with
answering questions from the following categories:


      1. Protocol analysis 
  
   Identify the root causes and the location of triggering events.
   Investigate interactions between routing protocols and topology. 

   For example questions here could be: How to identify the AS which
   originated an update? How many updates are due to what kind of
   events?
   
   
      2. Equipment scalability tests 

   Understand the scaling limitations of todays equipment before judging
   about the deployment of additional features. 

   For example questions here could be: How long does an update spend
   inside a router (under certain load conditions)? How much more load
   can inter-domain traffic engineering or a lower MRAI value impose on
   a router?


      3. Simulation

   Use network simulation to understand how routing updates traverses
   the network. Investigate interactions of various timers, of policies,
   between IGPs and BGP, etc.

   Example questions here could be: How to implement BGP in a way that
   the number of "dispensable" updates (caused by interconnectivity and
   timers) can be limited? ...



BGP is a protocol which evolved in over 15 years now. The most important
part is that network operators have full control over all settings and
their route distributions. 

My conclusion, regarding the future of BGP, is that a lot of problems
that we have with todays routing, are fixable within BGP and should be
fixed soon. Furthermore that enrichment (e.g., optional add-ons) to BGP
are not only necessary, but unavoidable! On the other hand a replacement
protocol will have a hard stand on the market. 

Therefore "mutations" are possible, but a replacement will be crushed by
"natural selection". This is the part of evolution theory that BGP is
subjected to - from my point of view.



References
----------

[Griffin99]   T. G. Griffin, and G. Wilfong, "An analysis of BGP
              convergence properties," in Proc. ACM SIGCOMM, 
              September 1999.

[RFC3345]     D. McPherson, V. Gill, D. Walton, and A. Retana, "Border
	      Gateway Protocol (BGP) Persistent Route Oscillation
              Condition", Request for Comments 3345, August 2002. 

[Mahajan02]   R. Mahajan, D. Wetherall, and Tom Anderson, "Understanding
              BGP Misconfiguration", ACM SIGCOMM, August 2002

[NANOG]       The North American Network Operators' Group 
              http://www.nanog.org/

[Caldwell03]  Don Caldwell, Anna Gilbert, Joel Gottlieb, Albert
	      Greenberg, Gisli Hjalmtysson, and Jennifer Rexford, "The
	      cutting EDGE of IP router configuration," unpublished
	      report, July 2003.

------------------------------------------------------------------------


Appendix A: Example, "the MRAI fight"
-------------------------------------

A critical factor in BGP update distribution is the Minimum Route
Advertisement Interval (MRAI) and the way it is implemented in router
software. The basic idea behind this timer is, to collect first all
updates arriving from different peers and pass only one "best" update
on. The RFC suggests that after one update for one prefix was send to
one peer, there should be a (jittered) delay of 30 seconds before
another update for the same prefix can be send to the same peer. Indeed
this limits the number of BGP messages that need to be exchanged. We
note that certain vendor specific implementations differs a lot from the
recommendation in the RFC and therefore introduce a significant
different propagation picture. Here are two examples:

From our current understanding of Cisco's MRAI implementation there are
two major differences with regards to the RFC. The first difference is,
that the timer is implemented on a per peer basis instead of per prefix
basis. Scalability reasons does not allow an implementation per peer and
prefix, but therefore almost ALL outgoing updates will be delayed - not
just two consecutive updates (close in time and belonging to one
prefix)! That means, that each and every update will be queued and only
propagated when the timer expires. The second difference is, that MRAI
is holding back withdraws as well as announcements. This is a major
cause of the observed BGP path exploration phenomena.  

Our current understanding is that MRAI on Junipers is called,
"out-delay" [https://www.juniper.net/techpubs/software/junos/junos57/
swconfig57-routing/html/bgp-summary32.html] and is disabled by default.
That means, Juniper is not holding back any BGP update messages. Indeed
this speeds up convergence, but at the risk that much more updates will
be send - which in turn triggers more damping.

The trade-off in this fight is between faster propagation and more
protocol messages. It is clear that in todays Internet more protocol
messages would lead to more damping, which doesn't improve convergence.
Even in a fictional Internet without damping, more protocol messages
would burn more CPU time. Therefore future research has to show whether
this is desirable (consider todays CPU speeds), or not (because of
scalability considerations).