Volume 21, Issue 2 (10-2024)                   JSDP 2024, 21(2): 3-14 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Mir M, Noferesti S. Using Data Augmentation Techniques for Sentiment Analysis of Users’ Opinions on Reopening of Schools During the Covid-19 Epidemic. JSDP 2024; 21 (2) : 1
URL: http://jsdp.rcisp.ac.ir/article-1-1385-en.html
University of Sistan and Baluchestan
Abstract:   (874 Views)
Sentiment analysis, also called opinion mining, is one of the sub-areas of natural language processing that aims to classify texts according to the sentiments, beliefs and attitudes expressed in them. In the most current research, texts are divided into two "positive" and "negative" categories. However, there are also other categories such as good/bad" and agree/disagree, every one of which has its applications.
The purpose of this paper is to analyze the opinions expressed by users on social media about the reopening of schools during the Covid-19 outbreak using supervised machine learning techniques, and to classify them into two "agree" and "disagree" categories. Users' opinions, in this paper, are in Persian. The lack of sufficient datasets and also the low accuracy of natural language processing tools are the most important problems of text processing in Persian. Due to the mentioned limitations, the use of supervised machine learning algorithms and also the extraction of effective features for training machine learning classifiers in Persian are facing a serious challenge.
In this paper, first, a small dataset of the users' opinions about the reopening of schools was collected and manually labeled. Then, a combined method was used for data augmentation of the dataset. In the proposed method, first, Persian sentences were translated into English. Then nouns, verbs and adjectives of the English sentences were replaced with their synonyms. Next, the English sentences were translated into Persian again. The new sentence with the class label of the initial sentence was added to the training set. Thus, the size of the training set increased by 97 percent. After that, the efficiency of employing the common pre-processing steps and using common feature sets in sentiment analysis of the English texts for Persian were evaluated and the best of them were selected. Considering the low accuracy of the Persian natural language processing tools, it was tried to select those features that were less dependent on the tools. Finally, machine learning classification was used to determine agree/disagree class of the user opinions of the test sets. The results of the experiments indicated that by applying the proposed method for data augmentation and using selected features in this paper, 81 and 79 percent precision was obtained for the polarity classification of opinions using SVM and CNN algorithms, respectively.
Article number: 1
Full-Text [PDF 786 kb]   (215 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2023/06/9 | Accepted: 2024/02/25 | Published: 2024/11/4 | ePublished: 2024/11/4

References
1. Liu, B., "Sentiment analysis and opinion mining", Synthesis lectures on human language technologies, Vol. 5, No. 1, pp. 1-167, 2012. [DOI:10.2200/S00416ED1V01Y201204HLT016]
2. Ahangari Ahangarkolaei, M., Sebti, A. and Yaghoubi, M., "Automatically generate sentiment lexicon for the Persian stock market", Signal and Data Processing, Vol. 20, No. 2, pp. 3-20, 2023. [DOI:10.61186/jsdp.20.2.3]
3. Noferesti, S. and Shamsfard, M., "A semantic framework based on domain knowledge for opinion mining of drug reviews", Journal of applied research and technology, Vol. 20, No. 6, pp. 652-667, 2022. [DOI:10.22201/icat.24486736e.2022.20.6.868]
4. Rajabi, Z., Valavi, M. and Hourali M., "Sentiment analysis methods in Persian text: A survey", Signal and Data Processing, Vol. 19, No. 2, pp. 107-132, 2019. [DOI:10.52547/jsdp.19.2.107]
5. Catelli, R., Pelosi, S., Comito, C., Pizzuti, C. and Esposito, M., "Lexicon-based sentiment analysis to detect opinions and attitude towards COVID-19 vaccines on Twitter in Italy", Computers in Biology and Medicine, Vol. 158, pp. 106876, 2023. [DOI:10.1016/j.compbiomed.2023.106876]
6. Huang, M., Xie, H., Rao, Y., Liu, Y., Poon, L. K. and Wang, F. L., "Lexicon-based sentiment convolutional neural networks for online review analysis", IEEE Transactions on Affective Computing, 2020.
7. Baccianella, S., Esuli, A. and Sebastiani, F., "Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining", LREC. 2010.
8. Cambria, E., Liu, Q., Decherchi, S., Xing, F. and Kwok, K., "SenticNet 7: A commonsense-based neurosymbolic AI framework for explainable sentiment analysis", In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 3829-3839, 2022.
9. Dashtipour, K., Hussain, A., Zhou, Q., Gelbukh, A., Hawalah, A.Y. and Cambria, E., "PerSent: A freely available Persian sentiment lexicon", In International Conference on Brain Inspired Cognitive Systems, pp. 310-320, Springer, Cham, 2016. [DOI:10.1007/978-3-319-49685-6_28]
10. Sabeti, B., Hosseini, P., Ghassem-Sani, G. and Mirroshandel, S.A., "LexiPers: An ontology based sentiment lexicon for Persian", arXiv preprint arXiv:1911.05263, 2019.
11. Revathy, G., Alghamdi, S.A., Alahmari, S.M., Yonbawi, S.R., Kumar, A. and Haq, M.A., "Sentiment analysis using machine learning: Progress in the machine intelligence for data science", Sustainable Energy Technologies and Assessments, Vo. 53, pp. 102557, 2022. [DOI:10.1016/j.seta.2022.102557]
12. Wang, Y., Chen, Q., Shen, J., Hou, B., Ahmed, M. and Li, Z., "Aspect-level sentiment analysis based on gradual machine learning", Knowledge-Based Systems, Vol. 212, pp.106509, 2021. [DOI:10.1016/j.knosys.2020.106509]
13. Riaz, S., Fatima, M., Kamran, M. and Nisar, M.W., "Opinion mining on large scale data using sentiment analysis and k-means clustering", Cluster Computing, Vol. 22, No. 3, pp. 7149-7164, 2019. [DOI:10.1007/s10586-017-1077-z]
14. Shams, M., Shakery, A. and Faili, H., "A non-parametric LDA-based induction method for sentiment analysis", In The 16th CSI international symposium on artificial intelligence and signal processing (AISP 2012). IEEE, 2012. [DOI:10.1109/AISP.2012.6313747]
15. Najafzadeh, M., Rahati Quchan, S. and Ghaemi, R., "A Semi-supervised Framework Based on Self-constructed Adaptive Lexicon for Persian Sentiment Analysis", Signal and Data Processing, Vol. 15, No. 2, pp. 89-102, 2018. [DOI:10.29252/jsdp.15.2.89]
16. Mendon, S., Dutta, P., Behl, A. and Lessmann, S., "A Hybrid approach of machine learning and lexicons to sentiment analysis: enhanced insights from twitter data of natural disasters", Information Systems Frontiers, pp.1-24, 2021. [DOI:10.1007/s10796-021-10107-x]
17. Ahangari, M. and Sebti, A., "A Hybrid Approach to Sentiment Analysis of Iranian Stock Market User's Opinions", International Journal of Engineering, Vol. 36, No. 3, pp.573-584, 2023. [DOI:10.5829/IJE.2023.36.03C.18]
18. Imran, A.S., Daudpota, S.M., Kastrati, Z. and Batra, R., "Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on COVID-19 related tweets", IEEE Access, Vol. 8, pp.181074-181090, 2020. [DOI:10.1109/ACCESS.2020.3027350]
19. Manguri, K. H., Ramadhan, R. N. and Amin, P. R. M., "Twitter sentiment analysis on worldwide COVID-19 outbreaks", Kurdistan Journal of Applied Research, pp. 54-65, 2020. [DOI:10.24017/covid.8]
20. Kaur, C. and Sharma, A., "Twitter Sentiment Analysis on Coronavirus using Textblob", EasyChair, 2020.
21. Ra, M., Ab, B. and Kc, S., "COVID-19 outbreak: Tweet based analysis and visualization towards the influence of coronavirus in the World", 2020.
22. Costola, M., Hinz, O., Nofer, M. and Pelizzon, L., "Machine learning sentiment analysis, Covid-19 news and stock market reactions", Research in International Business and Finance, pp. 101881, 2023. [DOI:10.1016/j.ribaf.2023.101881]
23. Leelawat, N., Jariyapongpaiboon, S., Promjun, A., Boonyarak, S., Saengtabtim, K., Laosunthara, A., Yudha, A.K. and Tang, J., "Twitter data sentiment analysis of tourism in Thailand during the COVID-19 pandemic using machine learning", Heliyon, Vol. 8, No. 10, pp. e10894, 2022. [DOI:10.1016/j.heliyon.2022.e10894]
24. Miller, G. A., "WordNet: a lexical database for English", Communications of the ACM, Vol. 38, No. 11, pp. 39-41, 1995. [DOI:10.1145/219717.219748]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing