Journal of Computer Virology and Hacking Techniques https://doi.org/10.1007/s11416-023-00476-z ORIGINAL PAPER Recognition of tor malware and onion services Jesper Bergman 1 · Oliver B. Popov 1 Received: 11 October 2022 / Accepted: 16 March 2023 © The Author(s) 2023 Abstract The transformation of the contemporary societies through digital technologies has had a profound effect on all human activities including those that are in the realm of illegal, unlawful, and criminal deeds. Moreover, the affordances provided by the anonymity creating techniques such as the Tor protocol which are beneficial for preserving civil liberties, appear to be highly profitable for various types of miscreants whose crimes range from human trafficking, arms trading, and child pornography to selling controlled substances and racketeering. The Tor similar technologies are the foundation of a vast, often mysterious, sometimes anecdotal, and occasionally dangerous space termed as the Dark Web. Using the features that make the Internet a uniquely generative knowledge agglomeration, with no borders, and permeating different jurisdictions, the Dark Web is a source of perpetual challenges for both national and international law enforcement agencies. The anonymity granted to the wrong people increases the complexity and the cost of identifying both the crimes and the criminals, which is often exacerbated with lack of proper human resources. Technologies such as machine learning and artificial intelligence come to the rescue through automation, intensive data harvesting, and analysis built into various types of web crawlers to explore and identify dark markets and the people behind them. It is essential for an effective and efficient crawling to have a pool of dark sites or onion URLs. The research study presents a way to build a crawling mechanism by extracting onion URLs from malicious executables by running them in a sandbox environment and then analysing the log file using machine learning algorithms. By discerning between the malware that uses the Tor network and the one that does not, we were able to classify the Tor using malware with an accuracy rate of 91% with a logistic regression algorithm. The initial results suggest that it is possible to use this machine learning approach to diagnose new malicious servers on the Tor network. Embedding this kind of mechanism into the crawler may also induce predictability, and thus efficiency in recognising dark market activities, and consequently, their closure. Keywords Tor · Malware · Machine learning · Forensics 1 Introduction The generative nature of digital technologies has transformed parts of the society from kinetic or analogue into non-kinetic or digital, creating a hybrid social, economic, and cultural space termed as Cyber Physical Systems (CPS). Digital transformation, among others, has had a profound effect on all human activities including those that are in the realm of illegal and criminal deeds. For instance, the Tor pro- tocol, almost a synonym for Anonymous Communication B Jesper Bergman jesperbe@dsv.su.se Oliver B. Popov popov@dsv.su.se 1 Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden Networks (ACNs), affords through anonymity-granting tech- niques preservation of civil liberties. Nevertheless, the same protocol appears to be highly profitable for miscreants whose crimes range from human trafficking, arms trading, and child pornography to selling controlled substances and racketeer- ing. Tor similar technologies are the foundation of vast and occasionally dangerous space termed as the Dark Web. With no borders, and permeating different jurisdictions, the dark web is a source of perpetual challenges for national and inter- national law enforcement agencies. The anonymity increases the complexity and the cost of identifying both crimes and criminals, which is often exacerbated with a lack of proper human resources [15]. However, digital technologies have also created a multitude of techniques and tools, for instance machine learning, artificial intelligence, intensive data har- 123