Workshop on Internet Routing Evolution and Design (WIRED)

October 7-8, 2003
Timberline Lodge, Mount Hood, Oregon, USA

Position statement of

Jennifer Rexford

(AT&T)





          
          Routing Problems are Too Easy to Cause, and Too Hard to Diagnose
          ================================================================
          
          IP routing protocols, such as OSPF or BGP, form a complex,
          highly-configurable distributed system underlying the end-to-end
          delivery of data packets.  "Highly configurable" is a nice way of
          saying "hard to configure" or "easy to misconfigure," and "distributed
          system" is a nice way of saying "hard to understand" or "hard to
          debug."  As such, we have a routing system today where a single
          typographical error by a human operator can easily disconnect parts of
          the Internet, and diagnosing and fixing routing problems remains an
          elusive black art.  This is unacceptable for any technology that would
          be considered a core communication infrastructure.  I believe that 
          the networking research community should devote significant attention
          to improving the state of the art in router configuration and network
          troubleshooting.
          
          Several factors conspire to make IP router configuration extremely challenging
          
          - Vendor configuration languages are primitive and low-level, like
          assembly language (e.g., a typical router may have ten thousand lines
          of configuration commands)
          
          - Routers implement numerous complex protocols (e.g., static routes,
          RIP, EIGRP, IS-IS, OSPF, BGP, MPLS, and various multicast protocols)
          that have many tunable parameters (e.g., timers, link weights/areas,
          and BGP routing policies)
          
          - The routing protocols interact with each other (e.g., "hot-potato"
          routing in BGP based on the underlying IGP, use of static routes to
          reach the remote BGP end-point, and route injection between protocols)
          
          - Scalability often requires even more complex configuration to limit
          the scope of routing information (e.g., OSPF areas and summarization,
          BGP route reflectors and confederations, and route aggregation)
          
          - Networks are configured at the element (or router) level, rather than
          as a single cohesive unit with well-defined policies and constraints
          
          - Key network operations goals, such as traffic engineering and
          security, are not directly supported, requiring operators to tweak the
          router configuration in the hope of having the right (indirect) effect
          on the network and its traffic
          
          Addressing these complicated problems will require research work in
          configuration languages, protocol modeling, and network modeling, and
          would hopefully lead to a higher level of abstraction for managing the
          configuration of the network as well as tools for configuration
          checking and, better yet, automation of configuration from a
          higher-level specification of the network goals.  Extensions (or
          replacements!) of the routing protocols may also be necessary to
          rectify some of these problems.
          
          Detecting, diagnosing, and fixing routing problems are also very
          complicated because:
          
          - Routing protocols are hard to configure, making configuration
          mistakes very common (see above!)
          
          - Routing protocols do not convey enough information to explain why a
          route has changed (or disappeared entirely)
          
          - No authoritative record exists that can identify which routes are
          valid (e.g., whether the originating AS is entitled to advertise the
          prefix, or whether one AS should be providing transit service from one
          AS to another)
          
          - Failures, configuration errors, or malicious acts in remote
          locations can affect the path between two hosts
          
          - Reachability problems can arise for other reasons, unrelated to the
          routing protocols (e.g., packet filtering or firewalls, MTU mismatches, 
          network congestion, and overloaded or faulty end hosts)
          
          - The end-to-end forwarding path depends on the complex interaction between 
          multiple routing protocols running in a large collection of networks
          
          - Route filtering and route aggregation (often necessary for scalability) can 
          lead to subtle reachability problems, including persistent forwarding loops
          
          - The network does not have much support for active measurement tools
          for measuring the forwarding path (i.e., traceroute is very primitive,
          and limited in its accuracy and potential uses)
          
          - The Internet topology is not fully known, at the router or the AS
          levels (or in terms of AS relationships and policies), and may be
          inherently unknowable
          
          Like router configuration, network troubleshooting has received little
          attention from the research community, despite its importance to
          network practitioners.  Research work in network support for
          measurement, extensions to routing protocols to facilitate diagnosis,
          and new diagnostic tools would be extremely valuable for improving the
          state of the art.