Research Article
Mining Network Traffic with the k-Means Clustering
Algorithm for Stepping-Stone Intrusion Detection
Lixin Wang ,
1
Jianhua Yang,
1
Xiaohua Xu,
2
and Peng-Jun Wan
3
1
TSYS School of Computer Science, Columbus State University, Columbus, Georgia, USA
2
College of Computing and Software Engineering, Kennesaw State University, Kennesaw, Georgia, USA
3
Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
Correspondence should be addressed to Lixin Wang; wang_lixin@columbusstate.edu
Received 15 December 2020; Revised 19 January 2021; Accepted 6 February 2021; Published 3 March 2021
Academic Editor: Wenzhong Li
Copyright © 2021 Lixin Wang et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Intruders on the Internet usually launch network attacks through compromised hosts, called stepping stones, in order to reduce the
chance of being detected. With stepping-stone intrusions, an attacker uses tools such as SSH to log in several compromised hosts
remotely and create an interactive connection chain and then sends attacking packets to a target system. An effective method to
detect such an intrusion is to estimate the length of a connection chain. In this paper, we develop an efficient algorithm to detect
stepping-stone intrusion by mining network traffic using the k-means clustering. Existing approaches for connection-chain-
based stepping-stone intrusion detection either are not effective or require a large number of TCP packets to be captured and
processed and, thus, are not efficient. Our proposed detection algorithm can accurately determine the length of a connection
chain without requiring a large number of TCP packets being captured and processed, so it is more efficient. Our proposed
detection algorithm is also easier to implement than all existing approaches for stepping-stone intrusion detection. The
effectiveness, correctness, and efficiency of our proposed detection algorithm are verified through well-designed network
experiments.
1. Introduction
This paper is an extension of our work originally presented at
the 39th IEEE International Performance Computing and
Communications Conference (IEEE IPCCC 2020) [1]. Many
attackers send attacking packets to remote target systems
through compromised machines, for the purpose of decreas-
ing the chance of being discovered [2–11]. The compromised
machines employed by the attackers are referred to as step-
ping stones. In a stepping-stone intrusion (SSI), an intruder
uses a chain of compromised machines on the Internet as
relay hosts and remotely logs in these machines by using soft-
ware tools such as SSH, rlogin, or telnet. The attacker sits in
front of his local host and types attacking commands that
are relayed via the stepping-stone hosts in the connection
chain until the attacking packets arrive the remote target sys-
tem that is under attack.
Since every TCP connection between a source node and a
destination node is independent of other connections even
though the connections might be relayed, accessing a remote
host via several relayed TCP connections makes it very diffi-
cult to determine the attacker’s actual geographical location.
Because the TCP protocol has such a property, the final tar-
get machine could only see the packets from the last connec-
tion of the chain. Therefore, it is extremely hard for a target
host to learn any information about the actual location of
the intruder.
A benefit of launching attacks using stepping stones is
that attackers could be hidden behind a long interactive
connection. If a SSI could be detected within the active
period of attacking, then the session could be cut off and
the target system could be protected. Although some
researchers worked on the back-tracing of SSI and studied
the upstream detection, most researchers focused on down-
stream SSI detection.
Intruders using SSI could build a connection chain given
in Figure 1 using software tools such as SSH to launch their
attacks. In Figure 1, we assume that Host 0 is used by the
Hindawi
Wireless Communications and Mobile Computing
Volume 2021, Article ID 6632671, 9 pages
https://doi.org/10.1155/2021/6632671