Malware Detection using Classification of Variable-Length Sequences

Hosseini, Fatemeh; Mirzarezaee, Mitra; Sharifi, Arash

doi:10.29252/jsdp.16.2.137

Volume 16, Issue 2 (9-2019) JSDP 2019, 16(2): 137-146 | Back to browse issues page

‎ 10.29252/jsdp.16.2.137

Mendeley

Zotero

RefWorks

Hosseini F, Mirzarezaee M, Sharifi A. Malware Detection using Classification of Variable-Length Sequences. JSDP 2019; 16 (2) :137-146
URL: http://jsdp.rcisp.ac.ir/article-1-666-en.html

Malware Detection using Classification of Variable-Length Sequences

Fatemeh Hosseini

, Mitra Mirzarezaee ^*

, Arash Sharifi

Islamic Azad University, Science and Research Branch of Tehran

Abstract: (4421 Views)

In this paper, a novel method based on the graph is proposed to classify the sequence of variable length as feature extraction. The proposed method overcomes the problems of the traditional graph with variable length of data, without fixing length of sequences, by determining the most frequent instructions and insertion the rest of instructions on the set of “other”, save speed and memory. According to features and the similarities of them, a score is given to each sample and that is used for classification. To improve the results, the method is not used alone, but in the two approaches, this method is combined with other existing Technique to get better results. In the first approach, which can be considered as a feature extraction, extracted features from scoring techniques (Hidden Markov Model, simple substitution distance and similarity graph) on op-code sequences, hexadecimal sequences and system calls are combined at classifier input. The second approach consists of two steps, in the first step; the scores which obtained from each of the scoring Technique are given to the three support vector machine. The outcomes are combined according to the weight of each Technique and the final decision is taken based on the majority vote. Among the components of the support vector machine, when given a higher weight in the similarity graph method (the proposed method), the result is better, Because the similarity graph method is more accurate than the other two methods. Then, in the second section, considering the strengths and benefits of each classifier, classifier outputs are combined and the majority voting is used. Three methods have been tested for group combinations, including Ensemble Averaging, Bagging, and Boosting. Ensemble Averaging consisting of the combination of four classifiers of random forests, a support vector machine (as obtained in the previous section), K nearest neighbors and naive Bayes, and the final decision is taken based on the majority vote; therefore, it is used as the proposed method. The proposed approach could detect metamorphic malware from Vxheaven set and also determines categories of malware with accuracy of 97%, while the SSD and HMM methods under the same conditions could detect malware with an accuracy of 84% and 80% respectively.

Keywords: Malware Detection, Graph Techniques, Combining Classifiers, Variable Length Classification, Support vector machine

Full-Text [PDF 2845 kb] (2084 Downloads)

Type of Study: Research | Subject: Paper
Received: 2017/04/28 | Accepted: 2019/06/19 | Published: 2019/09/17 | ePublished: 2019/09/17

References

1. [1] J. Quinlan, "Bagging, Boosting and C4.5," 2006.

2. [2] M. Alazab, R. Layton, S. Venkataraman and P. Watters, "Malware Detection Based on Structural and Behavioural Features of API Calls," Perth, WA, 2010.

3. [3] C. T. Lin, N.-J. Wang, H. Xiao and C. Eckert, "Feature Selection and Extraction for Malware Classification," journal of Information Science and Engineering 31, vol. 31, no. 3, pp. 965-992, 2015.

4. [4] J. Xu, A. H. Sung, S. Mukkamala ,and Q. Liu, "Obfuscated Malicious Executable Scanner," Journal of Research and Practice in Information Technology, vol. 39, pp. 181-197, 2007.

5. [5] M. J. Landage and P. M.P.Wankhade, "Malware Detection with Different Voting Schemes," COMPUSOFT, An international journal of advanced computer technology, vol. 3, no. 1, pp. 450-456, 2014.

6. [6] W. Wong and M. Stamp, "Hunting for meta-morphic engines," Journal in Computer Viro-logy, vol. 2, no. 3, pp. 211-229, 2006. [DOI:10.1007/s11416-006-0028-7]

7. [7] T. Kalbhor, "Dueling hidden Markov models for virus analysis," Journal of Computer Virology and Hacking Techniques, vol. 11, no. 2, pp. 103-118, 2015. [DOI:10.1007/s11416-014-0232-9]

8. [8] S. Attaluri, S. McGhee and M. Stamp, "Profile hidden Markov models and metamorphic virus detection," Journal in Computer Virology, vol. 5, no. 2, pp. 151-169, 2009. [DOI:10.1007/s11416-008-0105-1]

9. [9] C. Annachhatre and M. Stamp, "Hidden Markov models for malware classification," Journal of Computer Virology and Hacking Techniques, vol. 11, no. 2, pp. 59-73, 2015. [DOI:10.1007/s11416-014-0215-x]

10. [10] S. Josse and E. Filiol, "New Trends in Security Evaluation of Bayesian Network-Based Mal-ware Detection Models," Maui, Hawaii USA, 2012.

11. [11] T. Singh, F. D. Troia, V. A. Corrado, T. H. Austin and M. Stamp, "Support vector machines and malware detection," Journal of Computer Virology and Hacking Techniques, vol. 12, no. 4, pp. 203-212, 2016. [DOI:10.1007/s11416-015-0252-0]

12. [12] H.Ghaemi and M.Kahani, "Question classi-fication using ensemble classifiers," Quarterly Journal Signal and Data Processing, vol. 29, number.3, pp.99, 1395. [DOI:10.18869/acadpub.jsdp.13.3.99]

13. [13] V. Asch, "Macro- and micro-averaged evalua-tion measures [[BASIC DRAFT]]," univercity of Antwerp, 2013.

14. [14] M. Sokolova and G. Lapalme, "A systematic analysis of performance measures for classi-fication tasks," Information Processing and Management, vol. 45, pp. 427-437, 2009. [DOI:10.1016/j.ipm.2009.03.002]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote