Botnet Identification Via Universal Anomaly Detection Shachar Siboni and Asaf Cohen, Member, IEEE Abstract—The problem of identifying and detecting Botnets Command and Control (C&C) channels is considered. A Botnet is a logical network of compromised machines (Bots) which are remotely controlled by an attacker (Botmaster) using a C&C infrastructure in order to perform malicious activities. Accordingly, a key objective is to identify and block the C&C before any real harm is caused. We propose an anomaly detection algorithm and apply it to timing data, which can be collected without deep inspection, from open as well as encrypted flows. The suggested algorithm utilizes the Lempel Ziv universal compression algorithm in order to optimally give a probability assignment for normal traffic (during learning), then estimate the likelihood of new sequences (during operation) and classify them accordingly. Furthermore, the algorithm is generic and can be applied to any sequence of events, not necessarily traffic-related. We evaluate the detection algorithm on real-world network traces, showing how a universal, low complexity C&C identifi- cation system can be built, with high detection rates for a given false-alarm probability. Index Terms—Anomaly Detection; Botnets; Command and Control Channels; Universal Compression; Lempel-Ziv Algo- rithm; Probability Assignment; Individual Sequences. I. I NTRODUCTION C YBER-ATTACKS are a disturbing security threat exist- ing today in communication- and computer-based sys- tems. They affect a wide range of domains including elec- tricity and water infrastructures, financial and capital markets, medicine and healthcare, army, businesses, enterprises and universities around the world. The majority of massive cyber- attacks today are conducted by Botnets, including Distributed Denial-of-Service (DDoS) attacks, spamming, fraud and iden- tity theft, etc. A Botnet is a logical network of compromised machines, Bots, which are remotely controlled by a Botmaster using a Command and Control (C&C) infrastructure. The compro- mised machines can be any collection of vulnerable hosts, e.g. computers, mobile-phones or tablets. Infection is via infected websites, file-sharing networks, email attachments, and more (see infection tree analysis in [1]). Once a host is infected and becomes a Bot, it is programmed to use a C&C channel for further downloads and updates and awaits instructions from the Botmaster. It updates its data and operates upon receiving commands from the Botmaster (e.g., launch a DDoS attack). The C&C channel plays a key role in a Botnet by operating as the communication means within the network. The Botmas- ter manages and controls its Bots using these C&C channels S. Siboni and A. Cohen are with the Department of Communication System Engineering, Ben-Gurion University, Beer-Sheva, 84105, Israel. E- mails: sibonish@bgu.ac.il; coasaf@bgu.ac.il. Partially supported by the Israeli Chief Scientist under the Kabarnit consortium. in order to perform malicious activities on selected targets. This way, the Bots act as a distributed attack platform on- demand, coordinated by the Botmaster. However, due to the fact that the C&C channels are the only way the Botmaster can communicate with its Bots, they can be considered as the weakest link of a Botnet, as blocking them, renders the Botnets useless. Accordingly, a main objective is to identify and block C&C activities before any real harm is caused. In order to mask their activities and bypass defense mecha- nisms such as firewalls, Botnets uses common communication protocols as their C&C, including IRC [2], [3], HTTP [3], Peer-to-Peer (P2P) [4], [5], [6] and DNS [7]. Recently, Botnets also adopted social networks as the underlying C&C [8]. However, while it is tempting to develop protocol-specific methods to detect Botnets, attackers constantly improve their C&C infrastructures and develop new evasion capabilities, including changing signatures of the C&C traffic, employing encryption and obfuscation and using domain generation [9] in order to deceive detection systems. Current techniques for Botnet study and detection are based on honeynets, signatures-based detection and anomaly detec- tion models [10]. Honeynets act as traps in order to collect information about Bots and study their behavior [11]. Once the mechanism of the monitored Bots is exposed, it is possible to design a designated detection and blocking mechanism. Signature-based approaches rely on a signature database of notorious Botnets that were previously learned. However, signature-based techniques are prone to zero-day attacks and require a constant update of the signatures database [10]. Anomaly-based detection techniques, on the other hand, aim to detect anomalies in network traffic or system behaviour, which may indicate the presence of malicious activities. A basic assumption when using anomaly detection is that attacks differ from normal behavior. Thus, traffic analysis is used on both packet and flow levels, considering metrics such as rate, volume, latency, response time and timestamps in order to identify anomalous data. Indeed, anomaly detection seems as a promising approach for Botnet detection since it may detect new structures of attacks (zero-day attacks). However, this may come at the cost of high false-alarm rates. Moreover, to achieve good performance, one may require prior knowledge, e.g., statistical assumptions on the normal data, such as a Markov Model [5] or ARMA modeling [12]. Gu et al. proposed two anomaly-based detection systems, BotSniffer [3] and BotMiner [13], based on traffic analysis. The former was design for IRC- and HTTP-based Botnets, while the latter was designed as protocol independent de- tection, which requires no prior knowledge. However, both systems rely on Deep Packet Inspection techniques, hence are