Supporting Law-Enforcement to Cope with
Blacklisted Websites: Framework and Case Study
Mir Mehedi Ahsan Pritom
*
and Shouhuai Xu
†
*
Department of Computer Science, University of Texas at San Antonio
†
Department of Computer Science, University of Colorado Colorado Springs
Abstract—Cyber attackers have long abused web domains
and URLs to carry out various attacks such as Phishing, web
scamming, and malware attacks. In order to defend against
these attacks, URL blacklisting has been widely used. However,
this approach has significant weaknesses, especially from a law-
enforcement point of view. In particular, the law-enforcement does
not know what to do with a blacklist because it is unclear what
needs to be done (e.g., shutting down a host or domain) due to the
subtleties associated with the problem. In order to help the law-
enforcement in dealing with blacklisted URLs, we propose a novel
framework based on Machine Learning (ML) while providing the
law-enforcement with probabilistic classification and interpretabil-
ity of the predictions made by the interpretable model. Our
probabilistic classification and interpretability measures provide
a basis for law-enforcement trustworthy decision-making and
remove the black-box nature of traditional ML-based approaches.
Experimental results show that the framework is practical and has
further potential to tackle website maliciousness.
I. I NTRODUCTION
Websites have been widely abused as a medium for propa-
gating cyberattacks [1]–[3]. One simple defense against these
threats is to use URL and domain blacklists, which are client-
side interventions and often provided by third-party vendors
(e.g., Phishtank, Google Safe Browsing, URLhaus). However,
this does not completely eliminate the threat because some
users may not use such services and the malicious or com-
promised domains or hosts are still on the loose. Moreover,
these blacklists are far from perfect [4] because they are neither
complete, meaning that they do not contain all of the malicious
websites [5]–[7], nor accurate, meaning that they contain many
false-positive websites (including the compromise ones that
were malicious in the past but have already been cleaned up)
[2]. Another defense is to use Machine Learning (ML) models
to proactively detect malicious websites (see, e.g., [1], [2],
[8]). Moreover, there are also some third-party vendors (e.g.,
Netcraft [9]) providing takedown services on user requests
for protecting against cybercrimes (e.g., cybersquatting). of
abusing domains that are imitating a user’s brand to provide
user protection against cybercrimes.
However, there is one important perspective that has not
been investigated in the literature, namely law-enforcement. We
envision that law-enforcement will be, if not already, authorized
to take actions against malicious websites much alike they do
in case of botnets [10]. This introduces a new dimension of
the problem because the law-enforcement must treat detected
malicious websites carefully. For example, the law-enforcement
can be authorized to shut down a malicious website owned or
operated by a malicious party, but may only be authorized to
notify the owner or the operator of a website which itself is
compromised and then abused by an attacker to wage further
attacks. Moreover, oftentimes we observe that attackers reuse
the same domains and hosts for new URL based attacks causing
the same domain or hostname to appear in a URL blacklist [11].
This phenomena further encourages us to consider the law-
enforcement perspective, as more higher level such as domain
or host level intervention is more effective than client-side inter-
ventions. This call for studies on helping the law enforcement
in distinguishing between malicious (i.e., attacker-owned) and
compromised (i.e., legitimate party-owned) websites to take
actions.
Our Contributions. In this paper, we make three contributions.
First, we initiate the study of the law-enforcement perspective
when coping with malicious websites. This turns out to be
a challenge because of the dynamic nature of web domains
and complexity of web hosting infrastructures. This prompts
us to introduce a novel framework to help the law-enforcement
to cope with malicious websites. The framework highlights
the importance of using interpretable (i.e., explainable) ML,
while considering the probabilistic uncertainty associated with
the prediction outcomes. The framework integrates a ML
intepretability system, such as InterpretML [12], to provide ex-
planations and probabilistic predictions to the law-enforcement
(e.g., why is a website predicted as malicious, and what is the
likelihood it is indeed malicious?).
Second, we investigate how to choose the entity for taking
action: domain vs. hostname. To our knowledge, this is the first
time to propose a principle method for making such decisions.
Third, we conduct a case study on evaluating an instance of
the framework with a real-world URL blacklist. Experimental
results show that we achieve a 86% accuracy with a 0.92 F-1
score, while providing local explainability (i.e., interpretation)
for the individual prediction outcomes for each input blacklisted
website.
Paper Outline. Section II presents the problem statement.
Section III describes our framework. Section IV presents our
case study and results. Section V discusses limitations of this
study. Section VI discusses related prior studies. Section VII
concludes the paper.
2022 IEEE Conference on Communications and Network Security (CNS)
978-1-6654-6255-6/22/$31.00 ©2022 IEEE 181
2022 IEEE Conference on Communications and Network Security (CNS) | 978-1-6654-6255-6/22/$31.00 ©2022 IEEE | DOI: 10.1109/CNS56114.2022.9947260
Authorized licensed use limited to: UNIV OF COLORADO COLORADO SPRINGS. Downloaded on December 24,2022 at 16:46:02 UTC from IEEE Xplore. Restrictions apply.