Wednesday, October 28, 2009

Active network visions and reality: lessons from a capsule-based system

This paper describes an active networking implementation (that is, allowing network users to specify forwarding and related code to be deployed within the network). Rather than embed code itself in the forwarded packets, this design embeds a identifier which is retrieved from the prior hop on demand. The authors implementation using this scheme show that is has less than 1 millisecond latency overhead but can only keep with approximately T1-speed links. The authors blame this inefficiency on their use of Java and argue that a production implementation which would presumably use a different isolation technique could perform at the speed of faster links. The authors envision a system where not all nodes on the network are ‘active’ (which also enables incremental deployment), making high-speed nodes failing to implement active networking not a fatal flaw.

Most of the paper deals with countermeasures to security and scalability problems arising from running external code on routers. The authors use tight resource limits per-service in terms of node resource consumption and assume software isolation between services on the active nodes. But these restrictions are not enough to prevent abuses under their scheme: malicious users can write programs that ping-pong traffic within the network to generate hundreds of times the traffic of their edge link — and possibly much more if they are to be able to implement multicast, etc. in their network programs.

The authors solution (which they acknowledge is unsatisfying) is for services to be endorsed by a central authority. Even with this restriction, it is not clear how these services are to satisfy more local constraints: how, for example, could nodes homed against the commercial Internet and the Internet2 backbone enforce the restriction that the Internet2 backbone was only to be used for non-commercial traffic? Generally, the scheme takes away control from the middle of the network (especially given the difficulty of answering “policy questions” about arbitrary forwarding programs) while asking the middle to devote a lot of resources to supporting it.


Resilient Overlay Networks

From the observation that BGP as deployed on the Internet does not allow for particularly fast or flexible use of the many available paths, this paper developers an overlay network (RON) to take advantage of alternate paths. Each node in the overlay network probes its (Internet-routed) link to each other node, estimating its loss rate and latency as well as detecting unavailability of the link within tens of seconds. From these measurements, RON chooses an intermediate node (if any) to optimize for an application-chosen metric.

The authors deployed their system across approximately 20 nodes on the Internet (mostly US and mostly educational) and measured the packet loss, latency, and throughput provided in practice over their overlay network. They found a dramatic reduction in large (greater than 30%) packet loss rates and a small but significant improvement in latency and throughput across all links. These improvements persisted even when they ignored Internet2 links (which, for policy reasons, may not be used in practice for commercial traffic). They noted that different paths were chosen in practice when optimizing for different metrics (latency, loss, throughput) and that these paths were frequently not the ‘normal’ Internet route.

A subtext of this paper is that BGP and IP do a bad job: they are insufficiently reactive and expressive to make good use of the available Internet routes subject to policy constraints. The headline result: avoiding most outages to which BGP takes many minutes to respond, certainly does seem like something that a hypothetical new BGP could avoid. Of course (as the authors argue) some of advantages come from things no BGP replacement could do: the amount of state maintained by RON as is certainly would not scale to hundreds of thousands of routers (probably not even to hundreds, as the authors leave this tricky issue for future work). Also, the authors note that it is to their advantage that RON only deals with a small amount of the traffic (relative to link capacity) and thus avoids inducing new failures when it reroutes traffic.

Monday, October 26, 2009

On Inferring Autonomous System Relationships in the Internet

This paper describes a technique for inferring an annotated graph of AS to AS links. The annotations describe three types of relationships, one of which is directional, based on the export policies of the ASs:
  • provider to customer — a provider exports any routes on these links and a customer only exports its and its own customers' routes
  • peer to peer — only routes to customers (and the parties themselves) are exported
  • sibling to sibling — both customer and peer and provider routes may be exported
The authors infer these relationships from the AS paths in a number of BGP-derived routing tables from various peering points on the Internet. The authors work based on two assumptions:
  • AS paths are valley free (that is, they contain zero or more customer-to-provider and sibling-to-sibling links followed by zero or one peer-to-peer links followed by zero or more provider-to-customer and sibling-to-sibling links)
  • the highest degree AS in a path is the ‘top’ of the path (that is, all customer-to-provider links occur before it appears and all provider-to-customer links occur after it)
From these assumptions, the authors can determine that every AS before the ‘top’ AS gets transit from the following AS in the AS path and every AS after the top gives transit to the following AS. These transit relationships distinguish provider-to-customer links from other types of links. Then peering relationships can be found by determining based on no customer-to-provider routes being exported through a peering link. Some extra heuristics are used to deal with misconfigured BGP sites or exceptions where an apparent customer that (probably accidentally, the authors assume) provides transit between their two providers; ordinarily, the algorithm would infer that this was not a provider/customer relationship.

The authors ran their inference algorithm over BGP routing tables observed from 24 sites and compared their results to AT&T internal information about their own AS-AS relationship. They found that the inferences were mostly — approximately 99% — accurate for the AT&T. There were some notable errors; for example, they inferred that AT&T had a provider, characterized several customers (with ASs) as having no relationship to AT&T entirely and most of the sibling-to-sibling relationships they identified AT&T has having were considered other types of relationships by AT&T.

Based on a manual analysis, the authors blame their errors primarily on misconfigurations and inter-AS relationships not fitting into their simplistic model and the inaccuracy of their heuristic. Unfortunately, the authors do not attempt to quantify the relative effect of these issues and it is not apparent from their single example what sort of AS relationships could be added to make a substantially more complete model. The heuristics used are generally a bit disappointing: do the authors have good evidence that AS degree really determines the top of the path? is there a better way to ignore incorrectly advertised customer-to-provider paths (e.g. by noticing they typically won't ever be used).

On Power-Law Relationships of the Internet Toplogy

This paper examines several maps of the internet topology, three at the autonomous system level (derived from a BGP peering point) from over just over a years time in 1997-1998 and one of the router-level topology from 1995.

The authors demonstrate that these four graphs have what the authors call power laws: that is, relationships of the form y ≈ A*x^B for some A, B over some properties of parts of the graph x and y. The power laws they find empirically are:
  • between the outdegree rank (position in list of nodes sorted by outdegree) of a node and outdegree of a node;
  • between the frequency of an outdegree in the graph and the outdegree itself;
  • between the number of pairs of node within a distance of each other and the distance;
  • between the eigenvalues of the graph (presumably of its adjacency matrix though the authors appear to forget to specify this) and their ranks

As applications for these observations, the authors primarily propose using these observations to more realistic evaluate applications for the Internet. They observe that prior work which ignored these power laws made some poor estimates about properties of the internet graph: for example, they implicitly assumed a roughly uniform distribution of outdegrees and thus assumed that most nodes had much smaller 2 or 3 or 4-hop than actually occured. Toward the end of using these properties for evaluation, the authors try to extract properties that can be used directly to produce synthetic graphs for Internet protocol evaluation (for example, a division into a core/tree structure and a direct estimator of AS and router hop counts).

The authors also using these properties to produce test environments that will continue to reflect the Internet in the future. They suggest that these power laws will continue to exist over future topology graphs and so can be used to construct reasonable evaluations for future Internet protocols intended for a larger Internet.

Though the formulas for diameter, neighborhood size, etc. are good qualitative summaries of the Internet topology (and thus useful for protocol design, etc.), I find it hard to believe that one would really prefer to use them as is to evaluate protocols intended for the current topology when one could sample from an actual topology map.

It is disappointing that the authors did not have a very long time span of data to analyze. I found it strange that the authors did not attempt to collect a (inevitably partial) 1997 or 1998 map of the router-level topology of the Internet to allow for a several-year analysis of the evolution of the topology (which seems to be one of the major goals of the work). And, for the BGP views, the authors did not attempt to quantify how complete their graphs were (since some links may not be visible from some BGP peering points).

Monday, October 19, 2009

Interactive WiFi Connectivity For Moving Vehicles

This paper describes a protocol for vehicular WiFi (ViFi) based on utilizing multiple access points. A mobile host chooses a primary base station and several auxillary base stations. When the auxillary base stations overhear a packet to or from the mobile host that is not ACK'd, they probabilistically resend it. The probability of resending is based on the probability of other nodes deciding to resend the packet and the liklihood of reaching the ultimate destination, with a goal of usually resending the (apparently) unACK'd packet exactly once.

The authors evaluated their protocol using a trace-driven simulation (based on loss rate/location traces measured from two real testbed) which the authors apparently validated with measurements on the real testbed. Using their measurements, they compared ViFi to optimal handoff between base stations (with no backups to relay) and to optimal use of relaying (where the best possible base station always did the relying) and showed that they achieved substantially better utilization than using a single base station and nearly the optimal utilization for the relaying strategy. They also tested the application performance of some transport-layer protocols (TCP, VoIP), which was much better with their scheme than simple handoff though they ignored some TCP anomalies (e.g. apparently indefinately stalled connections).

A disappointing artifact of the paper was that the evaluation was apparently performed with very few simulated users (though it is unclear what the DeiselNet trace contained). The authors did not evaluate fairness between users despite a dubious decision to rely on carrier-sense and not attempt to reimplement exponential backoff which using broadcasts had disabled.

White Space Networking with Wi-Fi like Connectivity

This paper describes a MAC protocol for Wi-Fi-like whitespace networking ("WhiteFi"). The primary challenge is to choose and maintain choices of channels that avoid interference with existing users of the channel. To achieve this goal the hardware structure of a WhiteFi node includes both a scanner that drives a software-defined radio (used for access point detection and to detect ‘incumbents’ of channels) as well as a Wi-Fi-like radio.

To assign channels, the authors measure the utilization of each 6MHz frequency range and the number of active access points on each. Based on this, they estimate the share of bandwidth each node can expect receive and send these measurements to the access point. The AP choses a channel by aggregating these measurements (after taking into account which channels are used by incumbents seen by any end host or the AP).

To allow devices to quickly switch channels after incumbents are detected, a backup channel to which the AP listens (using its scanner radio) and clients fall back to in quiet periods is used.

The authors evaluated their active channel detection, AP discovery, and channel switching speed on an (apparently two-node) prototype deployment. They evaluated their channel selection algorithm in simulation and showed that it appeared to perform close the optimum in the presence of simulated background traffic, spatial variation.

Disappointingly, the authors don't appear to evaluate the effects their channel switching will have on end users. On their only experiment on the real testbed with this issue (Fig. 14) shows that there appears to be some serious throughput penalty when channel switching must occur, and from their evaluation it is not clear what effect that would likely have on transport protocols.

Wednesday, October 14, 2009

XORs in The Air: Practical Wireless Network Coding

This paper presents a routing and MAC scheme (COPE) that attempts to increase wireless throughput by using a coding techniques to send multiple packets at once. When a router has set of packets to forward to multiple destinations and each destination has all but one packet in a set, it can send the XOR of all the packets in the set and each destination can decode it. To allow routers to use this feature, they discover what packets their neighbors have available based on periodic reception reports and by guessing from overheard packets and transmission probabilities tracked by routing.

The authors evaluated COPE in building ad-hoc 802.11a testbed. With UDP flows between random pairs of hosts (with flow sizes corresponding to those from some Internet file sample), they observed approximately double effective throughput at high levels of network utilziation and very little improvement at low levels of network utilization.

With TCP flows in the same test bed, performance was considerably worse. The authors attributed this problem to TCP not recovering from unavoidable collisions losses due to hidden terminals that MAC layer retries often did not recovery from; they achieved an approximately 40% performance when they artificially eliminated hidden terminals (by placing hidden terminals within carrier-sense range of all other nodes but artificially restricting the offered paths) and sufficiently loaded the network. (The authors went to some extra effort to make TCP flows work well — for example, avoiding packet reordering though they did not evaluate the effect of this improvement.)

Though ordinarily random pair connection patterns seem like a good way to stress the network, it would seem that in this case, it's likely to make the coding scheme look better than it may be for many practical situations. It performs especially well when there are flows that cross each other on the network, but many common applications are asymmetric (e.g. web browsing-like patterns would likely have most traffic originating from a small number of gateways with relatively little traffic to code against in the opposite direction).

ExOR: Opportunistic Multi-Hop Routing for Wireless Networks

ExOR is a routing protocol that attempts to exploit ‘lucky’ delivery of packets along multiple lossy (hopefully in an uncorrelated way) hops from each router on the path. Each sent batch of packets is broadcast with an ordered list of next-hops. When a router receives a packet, it waits an amount of time according to where it is in the priority list and the sending rate in its environment and forwards packets it did not overhear from nodes higher in the priority list. The nodes on the ordered list are sorted by each nodes ETX to the destination and pruned based on which nodes which are used well enough in a simulation of ExOR. ExOR only attempts to transmit 90% of the packets in a batch, on the assumption that the last packets in a batch will be more expensive to send due to a higher liklihood of collisions (due to insufficient information to schedule packet sending well). The remaining packets are expected to be transmitted through traditional hop-by-hop metrics (though the authors did not appear to do this in their evaluation).

The authors evaluated ExOR on RoofNet comparing its efficiency to the RoofNet's routing with the ETX metric. The authors measured throughput compared to traditional routing (presumably with the ETX metric). They sent 10% more traffic when testing ExOR to compensate for the reduced delivery rate. Even accounting for this extra traffic, the authors found substantially better throughput than the ‘traditional’ routing for multi-hop paths and especially for the poorest routes under traditional routing.

Based on the evaluation, it appears that this protocol targets bulk transfers. The authors did not evaluate latency, and appear to expect users to run a transport protocol unlike TCP, since this technique will certainly introduce jitter and/or out-of-order delivery and/or losses. The authors implemented a web proxy, only using ExOR for 100KB or larger transfers, but unfortunately the paper does not evaluate that proxy's actual overhead.

Wednesday, October 7, 2009

A Performance Comparision of Multi-Hop Wireless Ad Hoc Networking Routing Protocols

There are several choices to be made in constructing ad-hoc routing protocols, several of which are explored by the protocols in this paper:
  • route construction time — on demand (AODV, DSR, TORA) or always (DSDV)
  • loop avoidance mechanism — sequence number (DSDV, AODV), source routing (DSR), topological ordering + drop packets during link reversal (temporary routing loop) (TORA)
  • route storage — next hop table (DSDV, AODV), topological ordering (“height”) (TORA), at source/on data packets (DSR)
Additionally, these algorithms need to choose various timeouts and rates and delays (to avoid accidental synchronization) for link state and route discovery messages as well as route propagation policies.

The authors of this paper implemented DSDV (with two different route propagation policies), AODV (with explicit beacons and with on-send loss detection), DSR, TORA in a detailed simulation of a 802.11 network with RTS/CTS. They choose several simulations scenarios with varying degrees of mobility (both in speed and movement frequencies) randomly in the simulated space. They used several fixed-rate packet sources (between 10, 20, or 30 random pairs of hosts) with sufficiently small packet sizes to usually avoid congestion losses. Given these scenarios, they measured the delivery rate, routing overhead (in bytes and packets), and route length versus the optimal path.

They found that routing protocol efficiency greatly depended upon the degree of mobility: faster, more frequent movement resulted in substantially larger overheads for all but DSDV. Except for TORA at high packet rates (where it experienced a routing protocol congestion collapse) and DSDV with high mobility (where its periodic updates were not fast enough), performance was good overall: more than 95% of packets were delivered and most packets took the shortest route. In terms of routing overhead, results were not surprising (except for TORA, which seems to be clearly broken), demand routing protocols (except for TORA) did substantially better and DSR seemed to trade off increased byte usage for decreased packet usage.

It's good to see a careful simulation of these various routing protocols under enough circumstances that it should be difficult to claim that an advantageous situation has been missed. Of course, the downside of a big sensitivity analysis like this is that reading the paper, I can't really tell how the choice of protocol would matter for a realistic application of an ad-hoc network. Except for the obviously bad cases (e.g. most of TORA, DSDV with very high mobility), would these differences (in overheads, delivery rates) even have a substantial effect on application performance?

A High-Throughput Path Metric for Multi-Hop Wireless Routing

This paper describes and evaluates the routing metric that was ultimately used in RoofNet based on an office environment deployment. The metric assumes hop-by-hop acknowledgments and retransmissions for unicast traffic. For each hop, the expected number of transmissions required to successfully send a packet along that hop is computed based upon the success rate of broadcasted probes (assumed to not use retransmission) in both directions. The authors believe that this metric will tend to find higher bandwidth paths better than merely counting hops and lower the overall load on the network. The authors recognize that their scheme assumes that variable transmission power would not be very advantageous (which seems strange given that their radios can definitely vary their power). Their scheme also seems specialized for their (common in wireless networks of this sort) half-duplex environment where a forwarded transmission interferes with the original sender so ‘pipelining’ is not possible.

The authors evaluated their deployment with a 802.11b network deployed in offices in their lab. They measured the hop-by-hop loss rates and computing a baseline ‘best’ set of route throughputs (assuming that their model of interference along a multi-hop route based on loss rates was accurate). They compared these throughputs to actual throughputs obtained using (modified to avoid instabilities) DSDV and DSR with both the standard hop-count metric and with transmission count metric.

Not surprisingly, the authors observed improved performance with their metric. They identified some non-obvious sources of the improvement which could be used independently by DSDV and DSR: avoiding asymmetric routes (which can be done by ‘handshaking’ before considering a link usable) and usually avoiding routes that experience transmission errors (which DSR does when given link-layer feedback).

The authors also evaluated the accuracy of their link quality measurements. They found that significant bias was introduced by the packet sizes of their probes — larger packets were received significantly less often than smaller ones, but their metric assumed all packets were equal.

Monday, October 5, 2009

Architecture and Evaluation of an Unplanned 802.11b Mesh Network

This paper evaluates the performance of a mesh network deployed in Cambridge, Massachusetts. The network had access points in locations that were convenient for the ‘community’ participants to used and routed between them using a link-state protocol. Using broadcasted probes, loss rate of every link was evaluated at each of the 802.11b supported bit rate (also taking into account the loss rate for ACKs on the reverse flow). From this information, routes (including the 802.11b bitrate at each hop) are chosen to maximize the end-to-end throughput across the network. Experimental evidence showed that the bandwidth prediction from the probes slightly overestimated multihop routes due to interference.

The authors showed that this network did, in fact, benefit in both connectivity and bandwidth from making extensive use of multihop paths because using several shorter links provided much better throughput (for TCP connections) than the (typically longer) paths with a lower hop count. A vast majority of nodes in the network had more than two neighbors in the routed network. Though the link quality (in terms of achievable throughput) was correlated with physical distance, there were plenty of relatively short links with low quality and some highly beneficial long links with high quality.

The authors notably evaluated (802.11b) RTS/CTS versus in their environment. Even though they had observed self-interference when a flow spanned multiple hops, RTS/CTS did not improve throughput.

As a case for creating wireless mesh networks without planning, this paper's result are disappointing. In the multi-hop TCP test 10% could not communicate (despite the nodes of the network being physically stationary), multi-hop routes were much slower than single-hop routes (even considering the penalty of switching off), and adding nodes at random appeared (from simulations) to have seriously diminishing returns.

Modeling Wireless Links for Transport Protocols

This paper characterizes several types of wireless links regarding how they may effect wireless links. The paper seems to be primarily motivated by poor models used for wireless links in prior work. Some areas for serious modelling errors:
  • losses — most link-layer protocols use forward error correction with a limit on retries, so loss rates due to errors aren't very high (as often assumed) except in very poor radio conditions. More common, the paper claims, are bursty losses from handoffs.
  • delay variation — contrary to the default of many simulators, delay jitter is frequently high due to error recovery and handovers and link scheduling (e.g. in cellular links) can cause delay spikes

  • not duplex
  • traffic models — consider multiple flows and flows including more than the wireless link
The paper gives suggestions on how to model these and some other effects (e.g. queue management, resource allocation, bandwidth variation, asymmetry) and provides some examples of how varying these parameters of the model can drastically affect the performance of transport protocols. The paper's value really does not seem to be so much in its evaluation of model differences (though they show some non-obvious results: e.g., buffer size can have a substantial effect on transport protocol behavior) but in providing a checklist of mostly simple mistakes to avoid.