Journal of Computer Virology and Hacking Techniques
https://doi.org/10.1007/s11416-023-00476-z
ORIGINAL PAPER
Recognition of tor malware and onion services
Jesper Bergman
1
· Oliver B. Popov
1
Received: 11 October 2022 / Accepted: 16 March 2023
© The Author(s) 2023
Abstract
The transformation of the contemporary societies through digital technologies has had a profound effect on all human
activities including those that are in the realm of illegal, unlawful, and criminal deeds. Moreover, the affordances provided
by the anonymity creating techniques such as the Tor protocol which are beneficial for preserving civil liberties, appear to
be highly profitable for various types of miscreants whose crimes range from human trafficking, arms trading, and child
pornography to selling controlled substances and racketeering. The Tor similar technologies are the foundation of a vast,
often mysterious, sometimes anecdotal, and occasionally dangerous space termed as the Dark Web. Using the features that
make the Internet a uniquely generative knowledge agglomeration, with no borders, and permeating different jurisdictions,
the Dark Web is a source of perpetual challenges for both national and international law enforcement agencies. The anonymity
granted to the wrong people increases the complexity and the cost of identifying both the crimes and the criminals, which
is often exacerbated with lack of proper human resources. Technologies such as machine learning and artificial intelligence
come to the rescue through automation, intensive data harvesting, and analysis built into various types of web crawlers to
explore and identify dark markets and the people behind them. It is essential for an effective and efficient crawling to have a
pool of dark sites or onion URLs. The research study presents a way to build a crawling mechanism by extracting onion URLs
from malicious executables by running them in a sandbox environment and then analysing the log file using machine learning
algorithms. By discerning between the malware that uses the Tor network and the one that does not, we were able to classify
the Tor using malware with an accuracy rate of 91% with a logistic regression algorithm. The initial results suggest that it is
possible to use this machine learning approach to diagnose new malicious servers on the Tor network. Embedding this kind
of mechanism into the crawler may also induce predictability, and thus efficiency in recognising dark market activities, and
consequently, their closure.
Keywords Tor · Malware · Machine learning · Forensics
1 Introduction
The generative nature of digital technologies has transformed
parts of the society from kinetic or analogue into non-kinetic
or digital, creating a hybrid social, economic, and cultural
space termed as Cyber Physical Systems (CPS). Digital
transformation, among others, has had a profound effect on
all human activities including those that are in the realm
of illegal and criminal deeds. For instance, the Tor pro-
tocol, almost a synonym for Anonymous Communication
B Jesper Bergman
jesperbe@dsv.su.se
Oliver B. Popov
popov@dsv.su.se
1
Department of Computer and Systems Sciences, Stockholm
University, Stockholm, Sweden
Networks (ACNs), affords through anonymity-granting tech-
niques preservation of civil liberties. Nevertheless, the same
protocol appears to be highly profitable for miscreants whose
crimes range from human trafficking, arms trading, and child
pornography to selling controlled substances and racketeer-
ing.
Tor similar technologies are the foundation of vast and
occasionally dangerous space termed as the Dark Web. With
no borders, and permeating different jurisdictions, the dark
web is a source of perpetual challenges for national and inter-
national law enforcement agencies. The anonymity increases
the complexity and the cost of identifying both crimes and
criminals, which is often exacerbated with a lack of proper
human resources [15]. However, digital technologies have
also created a multitude of techniques and tools, for instance
machine learning, artificial intelligence, intensive data har-
123