Monday, November 23, 2009

BotGraph: Large Scale Spamming Botnet Detection

The paper describes a log analysis technique intended to detect botnet-generated e-mail spamming through a commercial e-mail service. It assumes that it can either detect botnets when they are generating accounts, because they generate accounts at an usually high rate, or based on unusual relationships between the multiple sending accounts botnets maintain.

To detect the unusually high rate of account generation, the authors propose trackng the exponentially weighted moving average of the signup rate at each IP. Because real signup rates are typically very low, the authors argue that they can easily detect mass signups. It does seem that a clever botnet would ramp up its account generation rate slowly to evade these long-term averaging techniques. (The bots would try to appear to be a popular proxy, etc.) The authors argue that such techniques would be ineffective due to the natural churn of the botnet's compromised hosts, but given that, at least in office environments, many desktop machines are certainly not turned off routinely, and legitimate signup patterns can expect to see the same sorts of diurnal and weekly variations that machines being on and off see, I do not find this convincing.

Most of this paper is devoted to defending against attacks which do not use aggressive creation of new users. The authors use the correlation between accounts and the hosts which use them. The botnets the authors observe apparently load balance these accounts across many of its hosts over time and use many accounts from any particular compromised host. The authors therefore relate any two users based on the ASs they share being logged in from. Using these as edge weights and choosing a threshold, this induces a graph the users which can be used to cluster the users into candidate bot groups. These groups can be analyzed collectively for spammy behavior (because effective spamming requires the user accounts as a whole to have a high send rate per account given the effectiveness of signup rate limiting).

An obvious attack on this scheme, mentioned in the discussion of this paper, is for botnets to not reuse accounts between ASs. The authors argue for a countermeasure of grouping by IPs or some other smaller granularity to continue to catch those botnets. Clearly, as the authors argue, spammers have something to lose when they effectively cannot reuse their accounts between arbitrary pairs of machines, which makes this a worthwhile technique in spite of the relative ease of evading detection. The proposed future work, also correlating users by e-mail metadata and content would also likely be similarly worthwhile because an inability to reuse domains or arbitrary distribute destinations across accounts would also make spamming effectively more expensive.