Vol.:(0123456789) SN Computer Science (2023) 4:757 https://doi.org/10.1007/s42979-023-02243-9 SN Computer Science ORIGINAL RESEARCH Domain‑Checker: A Classifcation of Malicious and Benign Domains Using Multitier Filtering Abhay Pratap Singh Bhadauria 1  · Mahendra Singh 1 Received: 14 May 2023 / Accepted: 9 August 2023 © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2023 Abstract The loopholes of Internet are exploited by cyber-attackers to forward spam, commit fscal frauds, execute phishing, wallow in command-and-control, spread malware, and other malevolent activities. Many times, these cyber-attacks are conducted through malignant domain names, which are auto-generated by domain generation algorithms (DGAs), usually in huge numbers for the purpose of setting up a malicious command-and-control communications channel. The essential purpose of DGAs is to overcome domain fltering technologies and hiding the location of command-and-control servers. Therefore, diferentiation between malignant and benign domains is signifcant for securing the network. As blacklisting a malignant domain will not be a feasible solution in the current scenario of fast paced Internet, we anticipate for other useful ways of identifying malignant domains. The researchers have been investigating attributes from DNS data and lexical analysis of domain names, but there is a need to explore more efective methods to address the challenges due to fuctuations in domain name. This paper proposes an innovative and proactive approach to protect against the malignant domain attack and data exfltration using a web-based multi-tier flter. Results of the experiment carried out on a real-world dataset show that pro- posed approach can automatically and efciently block unknown malignant domains used in malicious activities. Keywords Domain name · Data exfltration · Malware · Botnet · DNS · DGA Introduction Cyber-attackers utilize malignant domain names globally for illegitimate activities in domain name system (DNS). The DNS is an essential communication protocol that is uti- lized for translating domain names (e.g., xyz.com) into IP address and vice versa [1]. Over the years, cyber-attackers have abused DNS communication because there is no secu- rity level at the resolver level, where the DNS translates a known malicious domain as well as a benign one when queried [2]. According to some reports [3, 4], the cases of malicious domains have now grown to a level where they cannot be overlooked. Hence, the identifcation of malignant domain names has a vital role to play in safeguarding the network security. Malicious domains are usually used to execute malicious activities, control malware-infected end-points, and steal personal or vital information. Hence, detecting the malicious domain through DNS data is a practical approach compared to other approaches. However, due to the recent development of DNS protocol implementation over HTTPs (DoH) [5], it is challenging to analyze DNS data because network trafc is enciphered. Some previous studies assumed that DNS trafc is not enciphered and was able to detect threats. In addition, to deal with domain name threats, identifying, and boycott- ing known malignant domain names are simple approaches usually utilized to safeguard valid users from cyber-threats [6]. Nevertheless, hackers realized these countermoves and used other well-known DGAs to elude blacklisting strate- gies. The key feature of DGAs-based methods is that they methodically produce a considerable amount of diferent domain names. With the help of these techniques, they also utilized the seed and time-based element to dynamically This article is part of the topical collection “Industrial IoT and Cyber-Physical Systems” guest edited by Arun K Somani, Seeram Ramakrishnan, Anil Chaudhary and Mehul Mahrishi. * Abhay Pratap Singh Bhadauria rs.abhaypratapsingh@gkv.ac.in Mahendra Singh msa@gkv.ac.in 1 Department of Computer Science, Faculty of Science, Gurukula Kangri (Deemed to be University), Haridwar 249404, India