The Feature Selection and Intrusion Detection Problems Andrew H. Sung & Srinivas Mukkamala Department of Computer Science, New Mexico Tech, Socorro, NM 87801, U.S.A. {sung,srinivas}@cs.nmt.edu Abstract. Cyber security is a serious global concern. The potential of cyber terrorism has posed a threat to national security; meanwhile the increasing prevalence of malware and incidents of cyber attacks hinder the utilization of the Internet to its greatest benefit and incur significant economic losses to individuals, enterprises, and public organizations. This paper presents some recent advances in intrusion detection, feature selection, and malware detection. In intrusion detection, stealthy and low profile attacks that include only few carefully crafted packets over an extended period of time to delude firewalls and the intrusion detection system (IDS) have been difficult to detect. In protection against malware (trojans, worms, viruses, etc.), how to detect polymorphic and metamorphic versions of recognized malware using static scanners is a great challenge. We present in this paper an agent based IDS architecture that is capable of detecting probe attacks at the originating host and denial of service (DoS) attacks at the boundary controllers. We investigate and compare the performance of different classifiers implemented for intrusion detection purposes. Further, we study the performance of the classifiers in real-time detection of probes and DoS attacks, with respect to intrusion data collected on a real operating network that includes a variety of simulated attacks. Feature selection is as important for IDS as it is for many other modeling problems. We present several techniques for feature selection and compare their performance in the IDS application. It is demonstrated that, with appropriately chosen features, both probes and DoS attacks can be detected in real time or near real time at the originating host or at the boundary controllers. We also briefly present some encouraging recent results in detecting polymorphic and metamorphic malware with advanced static, signature-based scanning techniques. 1 Introduction Intrusion detection is a problem of great importance to protecting information systems security, especially in view of the worldwide increasing incidents of cyber attacks. Since the ability of an IDS to identify a large variety of intrusions in real time with accuracy is of primary concern, we will in this paper consider performance measures of learning machine based IDSs in the critical aspects of classification accuracy, training time, testing times, and scalability. One of the main problems with IDSs is the overhead, which can become prohibitively high. To analyze system logs, the operating system must keep information regarding all the actions performed, which invariably results in huge amounts of data, requiring disk space and CPU resource. Next, the logs must be