International Journal of Inventions in Computer Science and Engineering, Volume 1 Issue 2 2014 ISSN (Online): 2348 – 3539 A NEW CLASSIFICATION STATISTICS TECHNIQUE FOR MALWARE DETECTION AND RISK ASSESSMENT IN A MODERN COMPUTER AND NETWORK SYSTEMS 1 V.G Indulakshmi, 2 Dr. T. Nalini, 1 PG Scholar, Department of Computer Science Engineering, Bharath University, Tamilnadu- India. 2 Professor, Department of Computer Science Engineering, Bharath University, Tamilnadu- India. Abstract: A Malware, short for malicious software, means a variety of forms of hostile, intrusive, or annoying software or program code. Malware is a pervasive problem in distributed computer and network systems. Malware variants often have distinct byte level representations while in principal belong to the same family of malware. The byte level content is different because small changes to the malware source code can result in significantly different compiled object code. In this project we describe malware variants with the umbrella term of polymorphism. We are the first to use the approach of structuring and de compilation to generate malware signatures. We employ both dynamic and static analysis to classify malware. Entropy analysis initially determines if the binary has undergone a code packing transformation. If packed, dynamic analysis employing application level emulation reveals the hidden code using entropy analysis to detect when unpacking is complete. Static analysis then identifies characteristics, building signatures for control flow graphs in each procedure. The similarities between the set of control flow graphs and those in a malware database accumulate to establish a measure of similarity. A similarity search is performed on the malware database to find similar objects to the query. Additionally, a more effective approximate flow graph matching algorithm is proposed that uses the de compilation technique of structuring to generate string based signatures amenable to the string edit distance. We use real and synthetic malware to demonstrate the effectiveness and efficiency of Malware. Keywords: Bayes classifier, Computer Security, Random forest, Spyware Reference to this paper should be made as follows: 1 V.G Indulakshmi, 2 Dr. T. Nalini, (2014) „A New Classification Statistics Technique For Malware Detection And Risk Assessment In A Modern Computer And Network Systems‟, International Journal of Inventions in Computer Science and Engineering, Volume 1 Issue 2 2014. 1 Introduction Our approach employs both dynamic and static analysis to classify malware. Entropy analysis initially determines if the binary has undergone a code packing transformation. If packed, dynamic analysis employing application level emulation reveals the hidden code using entropy analysis to detect when unpacking is complete. If not, then Static analysis then identifies characteristics, building signatures for control flow graphs in each procedure. The similarities between the set of control flow graphs and those in a malware database accumulate to establish a measure of similarity. A similarity search is performed on the malware database to find similar objects to the query 2 Background The existing system involves Static analysis incorporating n-grams; edit distances, API call sequences, and control flow have been proposed to detect malware and their polymorphic variants. A malware's control flow information provides a characteristic that is identifiable across strains of malware variants. Approximate matching of flow graph based characteristics can be used in order to identify a greater number of malware variants. To hinder the static analysis necessary for control flow analysis, the malware's real content is frequently hidden using a code transformation known as packing. Packing is also used in software protection schemes and file compression for legitimate software, yet the majority of malware also uses the code packing transformation. So it cannot be unpacked. Approximate matching of program structure has shown to be expensive in runtime costs. Poor performance in execution speed has resulted in the absence of approximate matching in end host malware detection. The main disadvantage of these approaches is that minor changes to the malware source code can result in significant changes to the resulting byte stream after compilation. This change can significantly impact the classification. 3 Proposed Methodology Our approach employs both dynamic and static analysis to classify malware. Entropy analysis initially determines if the binary has undergone a code packing transformation. If packed, dynamic analysis employing application level emulation reveals the hidden code using entropy analysis to detect when unpacking is complete.If not, then Static analysis then identifies characteristics, building signatures for control flow graphs in each procedure. The similarities between the set of control flow graphs and those in a malware database accumulate to establish a measure of similarity. A similarity search is performed on the malware database to find similar objects to the query. Two