Research Article Mining Network Traffic with the k-Means Clustering Algorithm for Stepping-Stone Intrusion Detection Lixin Wang , 1 Jianhua Yang, 1 Xiaohua Xu, 2 and Peng-Jun Wan 3 1 TSYS School of Computer Science, Columbus State University, Columbus, Georgia, USA 2 College of Computing and Software Engineering, Kennesaw State University, Kennesaw, Georgia, USA 3 Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA Correspondence should be addressed to Lixin Wang; wang_lixin@columbusstate.edu Received 15 December 2020; Revised 19 January 2021; Accepted 6 February 2021; Published 3 March 2021 Academic Editor: Wenzhong Li Copyright © 2021 Lixin Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Intruders on the Internet usually launch network attacks through compromised hosts, called stepping stones, in order to reduce the chance of being detected. With stepping-stone intrusions, an attacker uses tools such as SSH to log in several compromised hosts remotely and create an interactive connection chain and then sends attacking packets to a target system. An eﬀective method to detect such an intrusion is to estimate the length of a connection chain. In this paper, we develop an eﬃcient algorithm to detect stepping-stone intrusion by mining network traﬃc using the k-means clustering. Existing approaches for connection-chain- based stepping-stone intrusion detection either are not eﬀective or require a large number of TCP packets to be captured and processed and, thus, are not eﬃcient. Our proposed detection algorithm can accurately determine the length of a connection chain without requiring a large number of TCP packets being captured and processed, so it is more eﬃcient. Our proposed detection algorithm is also easier to implement than all existing approaches for stepping-stone intrusion detection. The eﬀectiveness, correctness, and eﬃciency of our proposed detection algorithm are veriﬁed through well-designed network experiments. 1. Introduction This paper is an extension of our work originally presented at the 39th IEEE International Performance Computing and Communications Conference (IEEE IPCCC 2020) [1]. Many attackers send attacking packets to remote target systems through compromised machines, for the purpose of decreas- ing the chance of being discovered [2–11]. The compromised machines employed by the attackers are referred to as step- ping stones. In a stepping-stone intrusion (SSI), an intruder uses a chain of compromised machines on the Internet as relay hosts and remotely logs in these machines by using soft- ware tools such as SSH, rlogin, or telnet. The attacker sits in front of his local host and types attacking commands that are relayed via the stepping-stone hosts in the connection chain until the attacking packets arrive the remote target sys- tem that is under attack. Since every TCP connection between a source node and a destination node is independent of other connections even though the connections might be relayed, accessing a remote host via several relayed TCP connections makes it very diﬃ- cult to determine the attacker’s actual geographical location. Because the TCP protocol has such a property, the ﬁnal tar- get machine could only see the packets from the last connec- tion of the chain. Therefore, it is extremely hard for a target host to learn any information about the actual location of the intruder. A beneﬁt of launching attacks using stepping stones is that attackers could be hidden behind a long interactive connection. If a SSI could be detected within the active period of attacking, then the session could be cut oﬀ and the target system could be protected. Although some researchers worked on the back-tracing of SSI and studied the upstream detection, most researchers focused on down- stream SSI detection. Intruders using SSI could build a connection chain given in Figure 1 using software tools such as SSH to launch their attacks. In Figure 1, we assume that Host 0 is used by the Hindawi Wireless Communications and Mobile Computing Volume 2021, Article ID 6632671, 9 pages https://doi.org/10.1155/2021/6632671