A New Framework for Distributed Multivariate Feature Selection

Sharifnezhad, Mona; Rahmani, Mohsen; Ghafarian, Hosein

doi:10.61186/jsdp.19.4.19

Volume 19, Issue 4 (3-2023) JSDP 2023, 19(4): 19-32 | Back to browse issues page

‎ 10.61186/jsdp.19.4.19

Mendeley

Zotero

RefWorks

Sharifnezhad M, Rahmani M, Ghafarian H. A New Framework for Distributed Multivariate Feature Selection. JSDP 2023; 19 (4) : 2
URL: http://jsdp.rcisp.ac.ir/article-1-1156-en.html

A New Framework for Distributed Multivariate Feature Selection

Mona Sharifnezhad

, Mohsen Rahmani ^*

, Hosein Ghafarian

Arak University

Abstract: (1983 Views)

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods.
In this paper, we suggest a distributed version of the mRMR feature selection approach. In mRMR, feature selection is performed based on maximum relevance to class and minimum redundancy among the features. The suggested method include six stages: in the first stage, after determining training and test data, training data are distributed horizontally. All subsets have same number of features. In the second stage, each subset of features is scored using mRMR feature selection. Features with higher ranks are selected and others are eliminated. In the fourth stage, features which were omitted are voted. In the fifth stage, the selected features are merged to determine the final set. In the final stage, classification accuracy is evaluated using final training data and test data.
Our method quality has been evaluated by six datasets. The results prove that the suggested method can improve classification accuracy compared to methods just based on maximum relevance to class label in addition to runtime reduction.

Article number: 2

Keywords: Multivariate filter feature selection, Embedded feature selection, Classification, Distribution

Full-Text [PDF 604 kb] (1785 Downloads)

Type of Study: Research | Subject: Paper
Received: 2020/07/23 | Accepted: 2022/05/11 | Published: 2023/03/20 | ePublished: 2023/03/20

References

1. [1] I. Guyon, A. Elisseeff, (2003)," An introduction to variable and feature selection", Journal of Machine Learning Research, Vol.3, pp.1157-1182.

2. [2] I.Guyon, S.Gunn, M.Nikravesh and L.A.Zadeh, (2006), "Feature Extraction: Foundations and Applications", vol. 207, Springer, ISBN-10: 9783540354871. [DOI:10.1007/978-3-540-35488-8]

3. [3] V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos (2013), "A Distributed Wrapper Approach for Feature Selection", ESANN proceedings, Computational Intelligence and Machine Learning, ISBN 978-2-87419-081-0.

4. [4] V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos(2015), "A Distributed Feature Selecion Approach Based on a Complexity Measure", Advances in Computational Intelligence, pp. 15-28. [DOI:10.1007/978-3-319-19222-2_2]

5. [5] G. Chandrashekar and F. Sahin (2014), "A survey on feature selection methods", journal of Computers and Electrical Engineering vol. 40, pp.16-28. [DOI:10.1016/j.compeleceng.2013.11.024]

6. [6] V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos (2015), "Distributed feature selection: An application to microarray data classification", Applied Soft Computing, vol. 30, pp. 136-150. [DOI:10.1016/j.asoc.2015.01.035]

7. [7] V. Bolón-Canedo, N. Sánchez-Maroño, and J. Cerviño-Rabuñal(2013), "Scaling up feature selection: a distributed filter approach", Advances in Artificial Intelligence, pp. 121-130. [DOI:10.1007/978-3-642-40643-0_13]

8. [8] L. Morán-Fernández, V. Bolón-Canedo, and A. Alonso-Betanzos (2016), "Centralized vs. distributed feature selection methods based on data complexity measures", Journal of Knowledge-Based Systems, vol. 117 , pp.27-45. [DOI:10.1016/j.knosys.2016.09.022]

9. [9] L. Mor'an-Fern'andez, V. Bol'on-Canedo, and A. Alonso-Betanzos(2015), "A Time Efficient Approach for Distributed Feature Selection Partitioning by Features", Lecture Notes in Computer Science book series (LNCS), vol. 9422, pp.245-254. [DOI:10.1007/978-3-319-24598-0_22]

10. [10] L. Yu, H. Liu, (2004)," Efficient feature selection via analysis of relevance and redundancy", J. Mach. Learn. Res. 5 , 1205-1224.

11. [11] C. Ding, H. Peng, (2005) "Minimum redundancy feature selection from microarray gene expression data", Journal of Bioinformatics and computational Biology, Vol.03, No.02, pp.185-205. [DOI:10.1142/S0219720005001004] [PMID]

12. [12] R. Kohavi, GH. John (1997), "Wrappers for feature subset selection", Artificial Intelligence, Vol. 97, Issues 1-2, pp.273-324. [DOI:10.1016/S0004-3702(97)00043-X]

13. [13] J.Li, K.Cheng, S.Wang, F. Morstatter, and R. P. Trevino(2018)," Feature Selection: A Data Perspective", Journal of ACM Computing Surveys (CSUR), Vol. 50 ,Issue 6. [DOI:10.1145/3136625]

14. [14] A.De Haro Garc'ıa, (2011), "Scaling data mining algorithms. Application to instance and feature selection", Ph.D. Thesis, University of Granada.

15. [15] H. Djellali, N. Ghoualmi Zine and N. Azizi (2016), "Two Stages Feature Selection Based on Filter Ranking Methods and SVMRFE on Medical Applications", Modelling and Implementation of Complex Systems. Lecture Notes in Networks and Systems, Springer, Cham,, vol. 1, pp. 281-293. [DOI:10.1007/978-3-319-33410-3_20]

16. [16] H. Min and W. Fangfang (2010), "Filter-Wrapper Hybrid Method on Feature Selection ", Second WRI Global Congress on Intelligent Systems (GCIS), pp.98-101. [DOI:10.1109/GCIS.2010.235]

17. [17] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik (2002), "Gene selection for cancer classification using support vector machines", Journal of Machine Learning, vol. 46, Issue 1-3, pp. 389-422. [DOI:10.1023/A:1012487302797]

18. [18] D. Boughaci and A.A Alkhawaldeh (2018), "Three local search-based methods for feature selection in credit scoring", Vietnam Journal of Computer Science, May 2018, Vol. 5, Issue 2, pp. 107-121. [DOI:10.1007/s40595-018-0107-y]

19. [19] Q.Wang , J. Wan, F. Nie , B. Liu , C.Yan , and X. Li (2019), "Hierarchical Feature Selection for Random Projection", IEEE Transactions on Neural Networks and Learning Systems, Vol. 30 , Issue 5, pp. 1581 - 1586. [DOI:10.1109/TNNLS.2018.2868836] [PMID]

20. [20] http://archive.ics.uci.edu/ml/datasets/

21. [21] I.Guyon, J.Weston, S.Barnhill and V.Vapnik,(2002), "Gene selection for cancer classificationusing support vector machines, " Jornal of Machine Learning, vol.46, pp.389-422. [DOI:10.1023/A:1012487302797]

22. [22] H. Peng, F. Long, and C. Ding, (2005), "Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and minredundancy, "IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226-1238, Aug.. [DOI:10.1109/TPAMI.2005.159] [PMID]

23. [23] M.A. Hall, L.A. Smith, (1998), "Practical feature subset selection for machine learning", Comput. Sci.98,181-191

24. [24] I. Kononenko,(1994)," Estimating attributes: analysis and extensions of RELIEF", Machine Learning: ECML-94, vol. 784, pp 171-182 [DOI:10.1007/3-540-57868-4_57]

25. [25] M. Robnik-Šikonja and I. Kononenko, (2003), "Theoretical and empirical analysis of ReliefF and RReliefF", Machine learning, vol. 53,Issue:1-2, pp. 23-69. [DOI:10.1023/A:1025667309714]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote