Workshop on Internet Routing Evolution and Design (WIRED)

October 7-8, 2003
Timberline Lodge, Mount Hood, Oregon, USA

Position statement of

Cengiz Alaettinoglu

(Packet Design)





          Link-state routing convergence and stability: is there a trade off?
          Cengiz Alaettinoglu
          
          New service sensitive applications require increasing level of network
          availability. Current IGP restoration times are in seconds, much better
          than 10s of seconds a few years ago. However, this is still not acceptable
          for many service sensitive applications such as VoIP or online gaming.
          
          In theory, link state routing restoration times can be as fast as a single
          SPF computation time (100s of microseconds to few milliseconds) plus some
          scheduling delay. However, such an implementation may not be
          practical. Instead, implementations which achieve restoration within
          propagation delay time frames (10s to few 100s of milliseconds) are within
          reach today.
          
          Why is it then the current IGP deployments can not achieve such
          convergence times? Because, and for very good reasons, there is a
          misconception of a trade off between IGP convergence times and
          stability. In order to ensure stability, there are timers that limit the
          effect of external instability to the system. Definitely these timers are
          on the way of fast convergence. However, while trying to tune down these
          timers to achieve fast convergence in the past, several ISPs have
          experienced network wide melt downs.
          
          If so, why is this trade off a misconception? Because, it is not a trade
          off between convergence and stability in general, it only exists for the
          current IGP implementations. It is possible to avoid instability by
          slowing down the convergence only during link recovery. Further protection
          can also be provided by damping the spf process.
          
          Vendors attempted to implement such protection by implementing adaptive
          timers that limit how often the spf process can be run. However, since
          these algorithms were implemented without having realistic IGP
          measurements, for the IGP deployments we studied, they always delayed the
          routing convergence.
          
          Thus, what is needed to achieve fast convergence without sacrificing
          stability is good damping algorithms which can separate unstable
          components from the stable components and tune themselves to the
          conditions of the network. This can only be done with careful measurement
          and analysis of IGP routing protocols. What is harder to come is to win
          back trust of ISPs once such algorithms have been implemented.