Wednesday, September 9, 2009

Core-Stateless Fair Queuing

This paper describes a method to implement fair queuing across a network without requiring exceptional state tracking from the most routers. Rather than compute a priority for packets, the primary mechanism used to control sending rates is probabilistic dropping based on the rate of the flow the packet is in. Computing the flow rate involves tracking substantial per-flow state, but such per-flow state can be tracked only at network edges and then attached to outgoing packets (and modified as necessary within). To determine how often to drop, it is also necessary to estimate the fair share deserved by each flow, which is done by comparing the observed overall drop and arrival rates at each router.

The authors evaluate this mechanism over several simulated topologies with variously behaved competing flows. They show that the performance of CSFQ is often competitive to a full per-flow state implementation of fair queuing (Deficient Fair Queuing). Its aboslute performance is usually slightly worse than DFQ (with the exception of performing better in cases where flows may suffer from not being able to temporarily exceed their "fair" buffer to accommodate burstiness).

CSFQ has some parameters which seem to be set arbitrarily and some which are not clear how to set in general. Arbitrary parameters include thresholds used to adjust for buffering effects. Exponential smoothing parameters the authors recommend setting substantially larger than delay-jitter and shorter than the average duration of a flow. Since with things like web pages which are in fact short-lived flows, we probably have a relatively short average flow duration; how should these parameters be set when flows may traverse the globe (with presumably high delay jitter) before exiting the network or traverse only to the local metropolis's peering point (with hopefully very low delay jitter)?

Analysis and Simulation of a Fair Queueing Algorithm

This article constructs and analyzes a method for multiple flows to share a bottleneck link fairly. The intuition derives from the intuitively fair algorithm when bits and parts of bits can be interleaved arbitrarily: if there are N flows and we can send one bit per unit of time, then every unit of time we want to send 1/N bits from each flow. Since packets must be kept together across the link, we can't actually do this, but the same effect can be achieved by computing the packet finishing times we would have if we could do this and using these as priorities for sending packets (ignoring effects from ties and small packets that arrive while larger packets are being sent). To provide better delay to flows not using their full share of bandwidth the authors also give extra priority to new packets from flows which were recently inactive (by effectively backdating the packets to fill the gap up to some threshold).

The authors evaluated their fair queuing algorithm (in simulation) against first-come first-serve and DECbit (which gives explicit congestion notifcations to flows over their fair share) running against continuous ("FTP") and intermittent ("telnet") flow controlled streams (both traditional TCP-style and with explicit congestion notification) and against misbehaved flows. In contrast to the other algorithms, their fair queuing algorithm provides well-behaved flows good performance in the presence of an ill-behaved flow and severely punishes the ill-behaved flow (which they assume to be loss-sensitive). Even for the normal cases, they show that they provide lower latency to the intermittent flow (rewarding it for using less than its fair share of bandwidth) than the DECbit scheme.

The incidental punishment of ill-behaved flows is strange; as the authors note, it is a result of charging flows for packets that cannot be queued, but it would be nicer to make the degree of punishment more explicit. (Since, surely, well-behaved hosts will send somewhat above their fair share while adjusting or due to network perturbation, and should necessarily lose throughput for it.)

Sunday, September 6, 2009

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

This article is concerned with congestion avoidance algorithms which utilize a single bit signal that the network is operating above or below its desired level of load for coordination between nodes on the network. In the model of the paper, time is divided into discrete intervals, and each user of the network sets its load for the next interval based on feedback generated by the previous interval.

Primarily evaluated are algorithms where at each time t, the new load is a(signal) + b(signal)*(old load) where signal is the single-bit indicator of whether the network was over the desired load at time t -1 . The appropriate values of a, b for each case are analyzed in light of convergence to both the desired level of load and the desired fairness. The main result is that additive increase/multiplicative decrease (i.e. a(underloaded) > 0, a(overloaded) = 0, b(underloaded) = 0, 0 < b(overloaded) < 1) is are the types of a and b functions which achieves convergence on both criteria. With respect to the rate of convergence, there is a tradeoff between quick convergence and low levels of oscillation around the goal load.

For me, the most interesting observation of the paper are the idealized throughput and response-time curves. TCP-style congestion control has its boolean signal of a sort (a dropped packet) — but where on that throughput/response time curve does it fall?

Congestion Avoidance and Control

This paper describes congestion control techniques for TCP built around reaching an equilibrium between packets entering and leaving the network and maintaining that equilibrium:
reaching the equilibirum
slow start — ramp up sending speed slowly so the correct sending speed won't be overshot. Prior systems would not recover from a mismatch in their local network rate and the least capable link to their end host because they would continuous overshot.
linear window size increase — be conservative about increasing the amount of data in flight to avoid massively overdriving the network, from which it is hard to recover.
don't retransmit until packets is out of the network
RTT variance estimate — because of queues, etc. round-trip time varies in the network a lot. This must be taken into account, which the authors do by creating an estimator of RTT variance.
backoff after retransmission enough to avoid more losses
exponential backoff — motivated by some theoretical results (about what sort of backoff is required for stability) and a desire to be as general as possible with what network topologies they support, the authors choose exponential backoff.

Despite the authors attempt at generality in these algorithms, since as they note there were TCP implementations for systems ranging from 56Kbit/s to hundreds of Mbit/s, there are some interesting assumptions about the environment. The most glaring one is that losses due to damage are are rare, which seems a strange assumption when one is considering microwave links, etc. It is hard to see, though, how this sort of congestion control (only at the ends and where loss types are not distinguishable) could function very well without assuming that at least most losses are due to congestion.

Wednesday, September 2, 2009

Understanding BGP Misconfiguration

This is a study of BGP misconfigurations inferred from changes in route announcements. The authors choose short-lived route announcements as candidate misconfigurations, then guessed the type of ‘misconfiguration’ from the AS graph (assuming the usual provider/peer/customer relationships only) combined with an e-mail survey. They found that misconfigurations were relatively common, affecting as much as 1% of route prefixes in their relatively short survey. A majority of misconfigurations were apparently due to poor usability of routers — either through apparent configuration file slips or through issues relating to recovering old routes after reboot. Some of the misconfigurations were a surprise to operators, notably those caused by misplaced assumptions about their provider's filtering and some from router initialization bugs, and of course, some were false positives from Internet routing arrangements not being fully described by their model of inter-network business relationships.

Perhaps most remarkable about this paper is that most of these problems do not adversely affect connectivity. In this sense, BGP appears to have partially achieved the old goal of survivability, if only by accident. (Indeed, it seems to perform better than the Registries' databases at providing information about how to reach networks.) The most serious effect seems to come from the added load these background misconfigurations place on the system and the more serious possibilities they reflect. They are reminders that configuration is not very automated or well understood, so future global network disrupting events are still seem quite possible.

Interdomain Internet Routing

This article primarily describes how inter-network routing works over the present Internet. BGP, the protocol that supports such routing, is relative simple (at least at network edges), consisting simply of annotated offers or withdrawls of offers to route to some set of addresses. The annotations provided include information about the overall path of the route, which is used to prevent loops and help make routing decisions, but routing decisions are heavily influenced by other policies.

Much of the interesting part of inter-network routing, how these routing decisions are made and which offers are made or withdrawn, are not part of BGP. Typically, these policies are determined by the economic relationship between the two networks. A provider wants to maximize utilization of links that make it money, so it has an incentive to advertise any available routes using such links. When deciding between multiple routes to the same destination, a provider will similarly be motivated to configure their software to prefer routes using links for which they are paid, and may then decide based on traditional “technical” criteria like path length. (Of course, explicit contractual arrangements also influence these decisions.)

BGP does seem to have been designed for a smaller, friendlier Internet. The state size is and its changes are not trivial and likely to get worse as more people would sensibly like to have an internet or a backup internet connection. As administrated, there is a lot of trust, revealed through the poor filtering illustrated by hijacked blocks and then general lack of route authentication in the protocol. And, despite its flexibility for route announcers to choose their policy, some policies seem hard to enforce technically, such as preferences for peers to use their own networks for cross-country/sea transit. (Presumably, this could be prevented by only advertising the routes through the preferred entry point except that this would break reachability during glitches on the peer's network, so it is easier to contractually require honoring MED when needed?)

Monday, August 31, 2009

The Design Philosophy of the DARPA Internet Protocols

This article gives a design rationale for the DARPA Internet protocols. The primary motivation derives from connecting a variety, separately administered networks and supporting a variety of applications on top of them. The targeted applications tended to be non-jitter-sensitive ones with varying bandwidth requirements, so packet switching was chosen over circuit-switching. Another important goal was tolerance of failures; fault-tolerance combined with separate adminstration and the least common denominator of a variety of networks lead to the datagram model: Nothing was to be assumed of the intermediate hosts in a connection, so all the state of the network would be provided at the ends of the connection. The IP layer, then, provides exactly this service, on top of which end-user services would be built with modifications only at the end points (including the initial example of TCP).

Probably the biggest tradeoff mentioned in the article was one of end-host complexity. By relying on such a minimalist network stack and tolerating fail-stop failures of the network between two end hosts, implementing the protocols on the end-hosts is not easy (compared to, e.g., network architectures that explicitly signal failures or flow control limits).

Somehow, despite placing high importance on resilience from explicit attack, the design considerations mentioned fail to account for the presence of an adversary on the network who can insert non-fail-stop failures. The article recognizes this is a lost of robustness:

In this respect, the goal of robustness, which lead to the method of fate-sharing, which led to host-resident algorithms, contributes to a loss of robustness if the host misbehaves.
but fails to mention the apparent design assumption that the separately administered hosts are reasonably well-behaved.

It is interesting that the article describes the lack of router state-carrying as a robustness feature. At our current scale, it seems more like a scalability feature. Though the article talks about performance considerations (particularly the difficult of specifying them to defense contractors) and mentions network-performance-sensitive activities like teleconferencing, performance generally and scalability in particular did not appear to be major goals. (The closest among the enumerated goals is “[permitting] host attachment with a low level of effort”, which was observed to not be well-achieved because of the protocol reimplementation effort.)