A General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification on Telegram

Zare Chahooki, Mohammad Ali; khalifeh zadeh, zahra

doi:10.52547/jsdp.19.2.175

Volume 19, Issue 2 (9-2022) JSDP 2022, 19(2): 175-196 | Back to browse issues page

‎ 10.52547/jsdp.19.2.175

Mendeley

Zotero

RefWorks

Zare Chahooki M A, khalifeh zadeh Z. A General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification on Telegram. JSDP 2022; 19 (2) : 12
URL: http://jsdp.rcisp.ac.ir/article-1-1110-en.html

A General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification on Telegram

Mohammad Ali Zare Chahooki ^*

, Zahra Khalifeh zadeh

Yazd University

Abstract: (2786 Views)

Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in countries such as Iran, Venezuela, Nigeria, Kenya, Russia, and Ukraine. This messenger has become a popular and extensively used messenger because it supports various languages and provides diverse services such as creating groups and channels with a large number of users and members. There is a large amount of contextual data on telegram groups containing hidden knowledge; the extraction of this knowledge can be beneficial. The requests on telegram users' messages are examples of this sort of data with hidden knowledge. Hence, identifying requests can respond to users' needs and help them fulfill their desires immediately; this drives users' business development. The authors identified these requests in a telegram search engine named the Idekav system of Yazd University. Then, the authors created opportunities to earn money by sending these requests to the business owners who were able to respond to them. Given the high dimensions of feature space in contextual data, it is necessary to reduce attributes using feature selection.
In the present study, the appropriate features were selected for Persian text classification and request identification. Among the feature selection methods, two local and global filter-based methods were chosen. By general investigation and combining the most extensively used filter-based FS methods, an optimal subset of important features was obtained. This hybrid feature selection method resulted in increased request identification accuracy, improved Persian text classification efficiency, and reduced training time and computation by optimizing the feature reduction. Of course, it is noteworthy that the classification accuracy is reduced in some methods; however, this value is negligible compared to the feature reduction value. Incorporating the concept of opinion mining into the analysis of emotions and questions can be a method to identify positive or negative demand in social networks. Therefore, the requests in the Persian telegram messages can be identified using opinion mining researches. For experiments in the present article, a dataset called Persian is used, which is extracted from the Idekav system. The selection of suitable features to increase model accuracy in request identification is an important part of this research. The support vector machine was employed to calculate accuracy. Given the acceptable results of the SVM, its various kernels were also calculated. Micro-averaging and macro-averaging criteria were also used for evaluation. Model inputs include many optimal feature subsets. Furthermore, feature selection methods have been proposed to produce suitable features for each model for increasing the accuracy of the model. Afterward, among all the features investigated, appropriate features have been selected for each of the applied feature selection models. For a more precise explanation, the main innovations of the present study are as follows:

Use of the most common filters based on local and global feature selection methods to find the optimal feature set.
Use of hybrid methods to create suitable features for predictive models of accuracy in Persian text classification and their application in identifying requests in Persian messages on telegram.
Selecting suitable features to increase accuracy and reduce computational time for each of the models under consideration. In this regard, in addition to picking an efficient algorithm, it is attempted to provide a method for making more appropriate choices.
Evaluation and testing of the proposed models for a large set of Persian data and many different features.

Article number: 12

Keywords: Feature Selection, Text mining, Classification Accuracy, Machine Learning

Full-Text [PDF 1656 kb] (612 Downloads)

Type of Study: Applicable | Subject: Paper
Received: 2020/01/12 | Accepted: 2021/03/2 | Published: 2022/09/30 | ePublished: 2022/09/30

References

1. [1] Iran Analytical News Agency, "In which countries do telegram messengers favor?", khabaronline.ir, July. 2, 2019. [Online]. Available: khabaronline.ir/news/1275665. [Accessed:4 January 2020].

2. [2] Economics News, "Latest statistics from the mostpopular social networks in Iran", eghtesadnews.com, April. 9, 2019. [Online]. Available: https://b2n.ir/661242. [Accessed:4 January 2020].

3. [3] Wikipedia contributors, "Telegram (software)," Wikipedia, The Free Encyclopedia, 27 December 2019, 15:24 UTC. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Telegram_(software)&oldid=932678184. [Accessed:4 January 2020].

4. [4] S. Ranganath, X. Hu, J. Tang, S. Wang, and H. Liu, "Understanding and Identifying Rhetorical Questions in Social Media,", ACM Transactions on Intelligent Systems and Technology, vol. 9, pp. 1-22, 2018. [DOI:10.1145/3108364]

5. [5] J. Zhang, A. Spirling, and C. Danescu-Niculescu-Mizil, "Asking too much? The rhetorical role of questions in political discourse,", arXiv preprint arXiv:1708.02254, 2017. [DOI:10.18653/v1/D17-1164]

6. [6] S. Ranganath, S. Wang, X. Hu, J. Tang, and H. Liu, "Facilitating Time Critical Information Seeking in Social Media,", IEEE Transactions on Knowledge and Data Engineering, vol. PP, pp. 1-1, 2017.

7. [7] A. D. Walker, P. Alexopoulos, A. Starkey, J. Z. Pan, J. M. Gómez-Pérez, and A. Siddharthan, "Answer Type Identification for Question Answering, ", in Joint International Semantic Technology Conference. Springer, Cham, 2015, pp. 235-251. [DOI:10.1007/978-3-319-31676-5_17]

8. [8] W. He, S. Zha, and L. Li, "Social media competitive analysis and text mining: A case study in the pizza industry,", International Journal of Information Management, vol. 33, no. 3, pp. 464-472, Jun. 2013. [DOI:10.1016/j.ijinfomgt.2013.01.001]

9. [9] H. A. Vamerzani and M. Khademi, "Exploring the Uses and Challenges of Big Data in Opinion Analysis," in Proceedings of the 7th Iranian Conference on Electrical and Electronics Engineering, Gonabad, Islamic Azad University of Gonabad, 2016.

10. [10] M. Kiani nejad, T. hashemi, and M. rashidi, " Text mining social networks for consumer brand feelings and desires," in Proceedings of the 6th International Conference on Economics, Management and Engineering Sciences, Belgium, International Center for Academic Communication, 2016.

11. [11] D. Ö. Şahin and E. Kılıç, "Two new feature selection metrics for text classification," Automatika, vol. 60, no. 2, pp. 162-171, 2019. [DOI:10.1080/00051144.2019.1602293]

12. [12] A. K. Uysal, "An improved global feature selection scheme for text classification," Expert systems with Applications, vol. 43, pp. 82-92, 2016. [DOI:10.1016/j.eswa.2015.08.050]

13. [13] M. Nekkaa and D. Boughaci, "Hybrid harmony search combined with stochastic local search for feature selection," Neural Processing Letters, vol. 44, no. 1, pp. 199-220, 2016. [DOI:10.1007/s11063-015-9450-5]

14. [14] X. Deng, Y. Li, J. Weng, and J. Zhang, "Feature selection for text classification: A review,", Multimedia Tools and Applications, vol. 78, no. 3, pp. 3797-3816, 2019. [DOI:10.1007/s11042-018-6083-5]

15. [15] L. M. Abualigah, A. T. Khader, M. A. Al-Betar, and O. A. Alomari, "Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering," Expert Systems with Applications, vol. 84, pp. 24-36, 2017. [DOI:10.1016/j.eswa.2017.05.002]

16. [16] D. Agnihotri, K. Verma, and P. Tripathi, "Variable global feature selection scheme for automatic classification of text documents," Expert Systems with Applications, vol. 81, pp. 268-281, 2017. [DOI:10.1016/j.eswa.2017.03.057]

17. [17] G. BİRİCİK, B. Diri, and A. C. SÖNMEZ, "Abstract feature extraction for text classification," Turkish Journal of Electrical Engineering & Computer Sciences, vol. 20, no. Sup. 1, pp. 1137-1159, 2012. [DOI:10.3906/elk-1102-1015]

18. [18] A. Melo and H. Paulheim, "Local and global feature selection for multilabel classification with binary relevance," Artificial intelligence review, vol. 51, no. 1, pp. 33-60, 2019. [DOI:10.1007/s10462-017-9556-4]

19. [19] H. Ogura, H. Amano, and M. Kondo, "Distinctive characteristics of a metric using deviations from Poisson for feature selection," Expert Systems with Applications, vol. 37, no. 3, pp. 2273-2281, 2010. [DOI:10.1016/j.eswa.2009.07.045]

20. [20] R. Saidi, W. Bouaguel, and N. Essoussi, "Hybrid Feature Selection Method Based on the Genetic Algorithm and Pearson Correlation Coefficient," in Machine Learning Paradigms: Theory and Application, Springer, 2019, pp. 3-24. [DOI:10.1007/978-3-030-02357-7_1]

21. [21] N. Nicolosi, "Feature selection methods for text classification," Department of Computer Science, Rochester Institute of Technology, Tech. Rep, 2008.

22. [22] C. Huang, J. Zhu, Y. Liang, M. Yang, G. P. C. Fung, and J. Luo, "An efficient automatic multiple objectives optimization feature selection strategy for internet text classification," International Journal of Machine Learning and Cybernetics, vol. 10, no. 5, pp. 1151-1163, 2019. [DOI:10.1007/s13042-018-0793-x]

23. [23] I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of machine learning research, vol. 3, no. Mar, pp. 1157-1182, 2003.

24. [24] G. Chandrashekar and F. Sahin, "A survey on feature selection methods," Computers & Electrical Engineering, vol. 40, no. 1, pp. 16-28, 2014. [DOI:10.1016/j.compeleceng.2013.11.024]

25. [25] Z. Zheng and R. Srihari, "Optimally combining positive and negative features for text categorization," In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.

26. [26] K. Quinn and O. Zaiane, 'Identifying questions & requests in conversation', in Proceedings of the 2014 International C* Conference on Computer Science & Software Engineering, 2014, pp. 1-6.

27. [27] B. Li, X. Si, M. R. Lyu, I. King, and E. Y. Chang, 'Question identification on twitter', in Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 2477-2480. [DOI:10.1145/2063576.2063996] [PMCID]

28. [28] B. Ojokoh, T. Igbe, A. Araoye, and F. Ameh, 'Question identification and classification on an academic question answering site', in 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), 2016, pp. 223-224. [DOI:10.1145/2910896.2925442]

29. [29] S. Ranganath, X. Hu, J. Tang, S. Wang, and H. Liu, 'Identifying Rhetorical Questions in Social Media.', in ICWSM, 2016, pp. 667-670.

30. [30] W. Cohen, V. Carvalho, and T. Mitchell, 'Learning to classify email into "speech acts"', in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004, pp. 309-316.

31. [31] A. Ramzy and A. Elazab, 'Question Identification in Arabic Language Using Emotional Based Features', arXiv preprint arXiv:2008.03843, 2020.

32. [32] B. Z. Abbasi, S. Hussain, S. Bibi, and M. A. Shah, "Impact of Membership and Non-membership Features on Classification Decision: An Empirical Study for Appraisal of Feature Selection Methods,", in 2018 24th International Conference on Automation and Computing (ICAC), 2018, pp. 1-6. [DOI:10.23919/IConAC.2018.8749009] [PMCID]

33. [33] A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney, "Feature selection methods for text classification," in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 2007, pp. 230-239. [DOI:10.1145/1281192.1281220]

34. [34] A. K. Uysal and S. Gunal, "A novel probabilistic feature selection method for text classification,", Knowledge-Based Systems, vol. 36, pp. 226-235, 2012. [DOI:10.1016/j.knosys.2012.06.005]

35. [35] B. Tang, S. Kay, and H. He, "Toward optimal feature selection in naive Bayes for text categorization," IEEE transactions on knowledge and data engineering, vol. 28, no. 9, pp. 2508-2521, 2016. [DOI:10.1109/TKDE.2016.2563436]

36. [36] S. L. Lam and D. L. Lee, "Feature reduction for neural network based text categorization," in Proceedings. 6th International Conference on Advanced Systems for Advanced Applications, 1999, pp. 195-202.

37. [37] N. G. R. Chawla, "Improved Feature Subset Selection using Hybrid Ant Colony and Perceptron Network," International Journal of Scientific Research and Management, vol. 5, no. 8, pp. 6764-6770, 2017.

38. [38] S. Gu, R. Cheng, and Y. Jin, "Feature selection for high-dimensional classification using a competitive swarm optimizer," Soft Computing, vol. 22, no. 3, pp. 811-822, 2018. [DOI:10.1007/s00500-016-2385-6]

39. [39] S. Choi, J. H. Shin, J. Lee, P. Sheridan, and W. D. Lu, "Experimental demonstration of feature extraction and dimensionality reduction using memristor networks," Nano letters, vol. 17, no. 5, pp. 3113-3118, 2017. [DOI:10.1021/acs.nanolett.7b00552] [PMID]

40. [40] H. Naji, W. Ashour, and M. Alhanjouri, "A New Model in Arabic Text Classification Using BPSO/REP-Tree," JOURNAL OF ENGINEERING RESEARCH AND TECHNOLOGY, vol. 4, pp. 28-42, 2017.

41. [41] N. Kumar, S. Mitra, M. Bhattacharjee, and L. Mandal, "Comparison of Different Classification Techniques Using Different Datasets," Singapore, 2019, pp. 261-272. [DOI:10.1007/978-981-13-1544-2_22]

42. [42] M. Labani, P. Moradi, F. Ahmadizar, and M. Jalili, "A novel multivariate filter method for feature selection in text classification problems," Engineering Applications of Artificial Intelligence, vol. 70, pp. 25-37, 2018. [DOI:10.1016/j.engappai.2017.12.014]

43. [43] M. A. Hall, "Correlation-based feature selection for machine learning," Doctoral dissertation, University of Waikato, Dept. of Computer Science, 1999.

44. [44] C. Liu, W. Wang, Q. Zhao, X. Shen, and M. Konan, "A new feature selection method based on a validity index of feature subset,", Pattern Recognition Letters, vol. 92, pp. 1-8, 2017. [DOI:10.1016/j.patrec.2017.03.018]

45. [45] V. Vapnik and V. Vapnik, "Statistical learning theory Wiley," New York, pp. 156-160, 1998.

46. [46] D. Sarkar, "Text Classification,", in Text Analytics with Python, Springer, 2019, pp. 275-342. [DOI:10.1007/978-1-4842-4354-1_5]

47. [47] M. B. Dastgheib and S. Koleini, "Persian Text Classification Enhancement by Latent Semantic Space," International Journal of Information Science and Management (IJISM), vol. 17, no. 1, p. 33, 2019.

48. [48] C. Qi, Z. Zhou, Y. Sun, H. Song, L. Hu, and Q. Wang, "Feature selection and multiple kernel boosting framework based on PSO with mutation mechanism for hyperspectral classification," Neurocomputing, vol. 220, pp. 181-190, 2017. [DOI:10.1016/j.neucom.2016.05.103]

49. [49] A. K. Uysal, "On two-stage feature selection methods for text classification,", IEEE Access, vol. 6, pp. 43233-43251, 2018. [DOI:10.1109/ACCESS.2018.2863547]

50. [50] M. Swamynathan, Mastering machine learning with python in six steps: A practical implementation guide to predictive data analytics using python. Apress, 2019. [DOI:10.1007/978-1-4842-4947-5]

51. [51] Z. Khalifeh-Zadeh, M. A. Z. Chahooki, "An Effective Method of Feature Selection in Persian Text for Improving the Accuracy of Detecting Request in Persian Messages on Telegram," Journal of Information Systems and Telecommunication (JIST), vol. 32, pp. 249-262, 2021. [DOI:10.29252/jist.8.32.249]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote