Malware Detection by Analysing Encrypted Network Traffic with Neural Networks Paul Prasse 1 , Luk´ aˇ s Machlica 2 ,Tom´aˇ s Pevn´ y 2 , Jiˇ r´ ı Havelka 2 , and Tobias Scheffer 1 1 University of Potsdam, Department of Computer Science, Potsdam, Germany {prasse, scheffer}@cs.uni-potsdam.de 2 Cisco R&D, Prague, Czech Republic {lumachli, tpevny, jhavelka}@cisco.com Abstract. We study the problem of detecting malware on client com- puters based on the analysis of HTTPS traffic. Here, malware has to be detected based on the host address, timestamps, and data volume infor- mation of the computer’s network traffic. We develop a scalable protocol that allows us to collect network flows of known malicious and benign ap- plications as training data and derive a malware-detection method based on a neural embedding of domain names and a long short-term memory network that processes network flows. We study the method’s ability to detect new malware in a large-scale empirical study. 1 Introduction Malware violates users’ privacy, harvests passwords, can encrypt users’ files for ransom, is used to commit click-fraud, and to promote political agendas by popularizing specific content in social media [1]. Several different types of analysis are being used to detect malware. The analysis of an organization’s network traffic complements decentralized antivirus software that runs on client computers. It allows organizations to en- force a security policy consistently throughout an entire network and to minimize the management overhead. This approach makes it possible to encapsulate mal- ware detection into network devices or cloud services. Network-traffic analysis can help to detect polymorphic malware [2] as well as new and as-yet unknown malware based on network-traffic patterns [3, 4]. When the URL string of HTTP requests is not encrypted, one can extract a wide range of features from it on which the detection of malicious traffic can be based [5]. However, the analysis of the HTTP payload can easily be prevented by using the encrypted HTTPS protocol. Google, Facebook, LinkedIn, and many other popular sites encrypt their network traffic by default. In June 2016, an estimated 45% (and growing) fraction of all browser page loads use HTTPS [6]. In order to continue to have an impact, traffic analysis has to work with HTTPS traffic. On the application layer, HTTPS uses the HTTP protocol, but all messages are encrypted via the Transport Layer Security (TLS) protocol or its predecessor,