Monday, November 23, 2009

BotGraph: Large Scale Spamming Botnet Detection

The paper describes a log analysis technique intended to detect botnet-generated e-mail spamming through a commercial e-mail service. It assumes that it can either detect botnets when they are generating accounts, because they generate accounts at an usually high rate, or based on unusual relationships between the multiple sending accounts botnets maintain.

To detect the unusually high rate of account generation, the authors propose trackng the exponentially weighted moving average of the signup rate at each IP. Because real signup rates are typically very low, the authors argue that they can easily detect mass signups. It does seem that a clever botnet would ramp up its account generation rate slowly to evade these long-term averaging techniques. (The bots would try to appear to be a popular proxy, etc.) The authors argue that such techniques would be ineffective due to the natural churn of the botnet's compromised hosts, but given that, at least in office environments, many desktop machines are certainly not turned off routinely, and legitimate signup patterns can expect to see the same sorts of diurnal and weekly variations that machines being on and off see, I do not find this convincing.

Most of this paper is devoted to defending against attacks which do not use aggressive creation of new users. The authors use the correlation between accounts and the hosts which use them. The botnets the authors observe apparently load balance these accounts across many of its hosts over time and use many accounts from any particular compromised host. The authors therefore relate any two users based on the ASs they share being logged in from. Using these as edge weights and choosing a threshold, this induces a graph the users which can be used to cluster the users into candidate bot groups. These groups can be analyzed collectively for spammy behavior (because effective spamming requires the user accounts as a whole to have a high send rate per account given the effectiveness of signup rate limiting).

An obvious attack on this scheme, mentioned in the discussion of this paper, is for botnets to not reuse accounts between ASs. The authors argue for a countermeasure of grouping by IPs or some other smaller granularity to continue to catch those botnets. Clearly, as the authors argue, spammers have something to lose when they effectively cannot reuse their accounts between arbitrary pairs of machines, which makes this a worthwhile technique in spite of the relative ease of evading detection. The proposed future work, also correlating users by e-mail metadata and content would also likely be similarly worthwhile because an inability to reuse domains or arbitrary distribute destinations across accounts would also make spamming effectively more expensive.

Not-A-Bot: Improving Service Availablity in the Face of Botnet Attacks

This paper proposes distinguishing between human-initiated requests and machine-initiated requests by having a trusted attester generate signatures tying an application-specified action to recent input activity. The attester is isolated from the host operating system by a trusted virtual machine. Integrity of the attester (and, assuming the VM's correctness, the confidentiality of its signing key) and the virtual machine is assured by a TPM (Trusted Platform Module).

For services for which botnet abuse was a concern, applications would be able to ask the attester to sign some specified data with a statement indicating that keyboard or mouse activity had occurred at a time close to the requested action. The attester would avoid generating multiple attestations for the same and thereby effectively ratelimits the sensitive applications to the rate of real user activity on the machine. In the "normal" use case, the application would send these attestations to the service. The service would verify the attestation and adjust its behavior; for example, it could prioritize attested requests over non-attested requests.

Botnets attempting to create false attestations would be able to do so on non-idle machines of course, but the authors argue the rate at which they could do so effectively would be seriously limited. One danger is that attestations in this system are storable indefinitely — they contain a nonce (generated by the attestor to prevent trivial replays), but no timestamp unless one is added. The authors assume that users of attestations will encode the time in the signed material; for example, signing the date header of any sent e-mail and using URLs encoding a service-generated nonce or timestamp. Services must also maintain a nonce cache to prevent replay, which may be quite large in the case of high-latency services like e-mail. The authors predict that even for large botnets, the victim machines on which these botnets are running will not have user activity often enough to produce attested spam, DDoS, or click fraud traffic at nearly the rate they are acapable of today.

Wednesday, November 18, 2009

Cutting the Electric Bill for Internet-Scale System

Based on this observation that energy prices differ around the world — and in different ways at different times — this paper proposes redirecting traffic to utilize energy at the cheapest (sufficiently close) site. The price differences the paper speaks of exploiting are not just diurnal and seasonal ones, but hourly ones from the electricity spot market, which have substantially larger price fluctuations that cannot be definitely planned for hours in advance.

The paper uses traces of both these electricity spot prices and Akamai's workload to evaluate the potential savings from moving computation where energy is cheaper. Some practical concerns with moving workload were modeled: keeping work close to users due to latency concerns, not dramatically increasing bandwidth costs, and the realities of not having a power-proportional datacenter. Under somewhat optimistic scenarios, electricity cost reductions of more than 30% seemed achievable with this type of optimization.

The simulation evaluation was very thorough and used a real workload (which are quite hard to come by). The real worklaod used here does not, however, sound like the type one would want to prove this system by: Akamai's customers are presumably paying in large part for Akamai's closeness to the end-users, but increased distance is just about the only compromise that cannot be avoided. Surprisingly, the US electricity market seems dynanmic enough that even allowing for modest compromises (e.g. less than 1000 miles) can provide substantial benefit.

Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems

This paper evaluates the viability of putting computers to sleep while preventing disruption from the machine not being on the network. Current platforms have ‘wake-on-LAN’ functionality, enabling machines to be woken up from a low-power state by a ‘magic packet’. A simple proxy could examine traffic destined to the machine and inject the magic packet before traffic to which the machine needs to respond. A more sophisticated version of this proxy could respond on the machine's behalf for simple but important queries. Using network traces for desktop machines in both home and work environments, this paper evaluates the likely power savings achieved with various levels of proxy sophistication.

The traces reveal that there is a lot of background noise on networks. Some of this traffic is broadcast at the Ethernet level but not actually logically directed to the end host (for example IPX traffic, routing backup traffic, ARP requests not for the end host's IP). Some is for protocols that require periodic maintainence traffic to maintain liveness, e.g. ARP (when querying the host), IGMP (if the host belongs to IP multicast groups), DHCP, TCP (for keepalives), SSDP. Such traffic was frequent enough in the author's traces that a proxy that would not handle that traffic would not be able to allow end hosts to sleep a substantial amount of time without disrupting service. Making such simple fill-in-the-blank replies for a handful of services and conservatively waking machines up for other services is all that would be required to allow machines to stay asleep for most their idle periods without sacrificing availability. Problematically, all it takes to disrupt this result is one misconfigured service (for example, the machine updating service the authors observed in the enterprise environment).

Scalable Application Layer Multicast

In contrast to the '95 paper, this paper, written in the days of considerably larger Internet, is very concerned about limiting the state stored on each node as the network grows. Probably also motivated by the realities of the Internet, this paper does not assume the infrastructure — the routers — will provide multicast services. Instead, end nodes are drafted into service as internal forwarding nodes in the multicast tree. The paper is devoted primarily to selecting these internal nodes and their neighbors, even in the face of failures and newly joining nodes.

The multicast tree maintained in this protocol has a strictly layered structure. Within each layer, nodes are partitioned into ‘clusters’. Each member of a cluster keeps track of all the other members of its clusters and forwards messages it generates to all other members of the cluster. For forwarding messages in and out of the clusters, each cluster elects a ‘leader’. This leader is required to be a member of some cluster in the next higher layer to forward messages outside of the original cluster. To allow recovery from loss of the leader, each member of a cluster also tracks the membership of its leader's cluster in the next layer.

To add a node to this tree, the joining node determines its distance (presumably in terms of latency) to all nodes of the highest level of the cluster, then the cluster in the next layer lead by the best node in the highest layer and so on. When it finds an appropriate cluster in the lowest layer, it joins it. When the leader of a cluster later detects that it is larger than the size threshold, it is split (potentially causing this procedure to repeat at a higher layer). Similar delayed detection by cluster leaders is used to initiate the merging clusters.

To maintain closeness in the network, nodes are responsible for periodically probing the alternate cluster leaders (in the same clsuter as its current cluster leader) and switching clusters if it has chosen inappropriately (or, as was surprisingly not mentioned in the paper, if network conditions have changed.)

Tuesday, November 17, 2009

A Reliable Multicast Framework for Light-weight Sessions and Application-Level Framing

This paper describes a technique for providing reliable delivery on top of a unreliabile multicast. The principal problem dealt with in the paper is avoiding the explosion of messages that would result by adapting TCP-like loss recovery to this many-receiver world. The first change necessary is switching to negative-acknowledgments. To make sure receivers are aware of which messages have been sent, periodic reports of the extent of messages (expressed as sequence numbers for each sender) are multicasted to the group by each participant. To prevent the size of each these reception reports from exploding, these reception reports only refer to a particular application-level entity being operated upon (a whiteboard "page" in the application in the paper).

From this reception information, participants can decide when to send a negative acknowledgment (which the paper calls a "repair request"). To prevent many repair requests from being sent at once (due to the failure of a link relatively "early" in the multicast tree), senders wait a random amount of time proportional to their distance from the original sender before sending the repair request. If, before this timer expires, they see another repair request for the same block, they backoff their timer exponentially (and use the same strategy to resend a transmission if the timer expires). Any host which has the data (all that received it are assumed to buffer the data), can send a retransmission, but uses a similar timer mechanism to lower the liklihood of duplicate retransmissions.

To select the random backoff values, the authors propose a heuristic scheme using feedback from the numbher of seen repair requests and retransmissions and some manually sent clamps. The authors evaluated this scheme in simulations over random multicast trees with up to 5000 nodes.

Presumably because it was divised with all participants being "first-class" — equally able to send messages, not distinguished by the strucutre of the mutlicast tree, and able to leave or join the group at will — this scheme requires nodes to track and resend a disappointing amount of state. Each node needs to keep estimates of the round-trip-time to each other node in the multicast group. To make these estimates, each node must send periodic reception reports. Without a means to summarize these reports (besides switching "pages"), the size of each report is proportional to the number of senders. (The authors do propose "local recovery" to limit the scope of retransmissions and associated messages, but this scheme does not reduce the need Though this amount state and messages are probably manageable for a low-bandwidth stream with thousands of recipients, it does not sound like scaling this scheme to larger data sizes and more recipients will be comfortable.

Thursday, November 12, 2009

X-Trace: A Pervasive Network Tracing Framework

Applications that run on networks inherently rely on the correct functioning of many components. Understanding failures or performance problems in such systems requires understanding how a logical tasks is performed across these components. X-Trace is a system that tracks each task's intercomponent dependencies and thereby allows the component to produce logs that can be reconstructed to show the logical execution of a task.

This dependency tracking is done by piggybacking information about the previous subtask in intercomponent request, including the “root’ task whether the new subtask is the next subtask or a subsubtask (which may be performed concurrently with other subsubtasks). When a component acts on a request, this metadata is attached to the log entries it creates, and these logs are aggregated using the task ID.

Propagating this metadata and aggregating the logs is relatively cheap, but not free (the authors observed a 15% decrease in throughput in an instrumented Apache). The main cost, however, is clearly in the modification of each component of the system. The system's pervasiveness requires its metadata hooks to infect all the components. Ideally, much of this could be automated, since most systems already ‘know’ which request they are acting on at any point, which presumably was what the authors were exploring in their modifications to libasync. Such knowledge does not easily let one automatically choose between ‘next’ versus ‘down’, and certainly won't easily detect the non-tree structures the authors argue (describing as future work) would be more appropriate for some systems.

After proposing tracing all the way down to the link layer, the authors sensibly proposed, as future work, limiting what is traced by a combination of choosing which layers to trace and which incoming requests to mark. Of course, we already have tools for diagnosing network difficulties at the network path level, so logically, it might be nice (and certainly require less switch modification) to use X-Trace to find the path in our maze of an overlay network and then compose in our existing tool. Unfortuantely, based on the figures in the paper, X-Trace does not really record failures in such straightforward a manner. What humans interpret as a (service-stopping) failure in their typical traces is an omitted node, apparently without a hint was to what that next-hop might be save (non-automated) comparision with the correct case.