Electronics 2022, 11, 3142. https://doi.org/10.3390/electronics11193142 www.mdpi.com/journal/electronics Article PDF Malware Detection Based on Optimizable Decision Trees Qasem Abu Al-Haija 1, *, Ammar Odeh 2 and Hazem Qattous 3 1 Department of Cybersecurity, Princess Sumaya University for Technology (PSUT), Amman 11941, Jordan 2 Department of Computer Science, Princess Sumaya University for Technology (PSUT), Amman 11941, Jordan 3 Department of Software Engineering, Princess Sumaya University for Technology (PSUT), Amman 11941, Jordan * Correspondence: q.abualhaija@psut.edu.jo Abstract: Portable document format (PDF) files are one of the most universally used file types. This has incentivized hackers to develop methods to use these normally innocent PDF files to create se- curity threats via infection vector PDF files. This is usually realized by hiding embedded malicious code in the victims’ PDF documents to infect their machines. This, of course, results in PDF malware and requires techniques to identify benign files from malicious files. Research studies indicated that machine learning methods provide efficient detection techniques against such malware. In this pa- per, we present a new detection system that can analyze PDF documents in order to identify benign PDF files from malware PDF files. The proposed system makes use of the AdaBoost decision tree with optimal hyperparameters, which is trained and evaluated on a modern inclusive dataset, viz. Evasive-PDFMal2022. The investigational assessment demonstrates a lightweight and accurate PDF detection system, achieving a 98.84% prediction accuracy with a short prediction interval of 2.174 μSec. To this end, the proposed model outperforms other state-of-the-art models in the same study area. Hence, the proposed system can be effectively utilized to uncover PDF malware at a high de- tection performance and low detection overhead. Keywords: portable document format (PDF); machine learning; detection; optimizable decision tree; AdaBoost; PDF malware; evasion attacks; cybersecurity 1. Introduction A piece of harmful code that has the potential to damage a computer or network is referred to as malware. As conventional signature-based malware detection technologies become useless and unworkable, recent years have seen a significant increase in malware. Malware developers and cybercriminals have adopted code obfuscation techniques, which reduce the efficiency of defensive mechanisms against malware [1,2]. Malware classification and identification remain a challenge in this decade. This is largely because advanced malware is more sophisticated and has the cutting-edge ability to remain hidden or change its code or behavior to behave more intelligently. As a result, outdated detection and classification methods are less useful today. As a result, the focus has shifted to machine learning for better malware identification and categorization [3,4]. Malicious PDF software is one of the common hacking methods [5]. Forensic research is hampered by the difficulty of separating harmful PDFs from large PDF files. Machine learning has advanced to the point where it may now be used to detect malicious PDF documents to assist forensic investigators or shield a system from assault [6]. However, adversarial techniques have been developed against malicious document classifiers. Pre- cision-manipulation-based hostile examples that have been carefully crafted could be mis- classified. This poses a danger to numerous machine-learning-based detectors [7,8]. For particular attacks, various analysis or detection methods have been provided. The threat posed by adversarial attacks has not yet been fully overcome. Figure 1 depicts a PDF doc- uments header, body, cross-reference table (xref), and trailer components [9]. Citation: Al-Haija, Q.A.; Odeh, A.; Qattous, H. PDF Malware Detection Based on Optimizable Decision Trees. Electronics 2022, 11, 3142. https://doi.org/10.3390/ electronics11193142 Academic Editors: Jungong Han and Ahmed Abu-Siada Received: 6 September 2022 Accepted: 28 September 2022 Published: 30 September 2022 Publisher’s Note: MDPI stays neu- tral with regard to jurisdictional claims in published maps and institu- tional affiliations. Copyright: © 2022 by the authors. Li- censee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and con- ditions of the Creative Commons At- tribution (CC BY) license (http://crea- tivecommons.org/licenses/by/4.0/).