Volume 20, Issue 4 (3-2024)                   JSDP 2024, 20(4): 67-88 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Vahedipoor M, Shamsi M, Rasouli Kenari A. Improving Persian Opinion Mining based on Polarity and Balancing Positive and Negative Keywords (case study: Digikala reviews for mobile). JSDP 2024; 20 (4) : 5
URL: http://jsdp.rcisp.ac.ir/article-1-1202-en.html
Abstract:   (281 Views)
In recent years, the massive growth of generated content by the users in social networks and online marketing sites, allows people to share their feelings and opinions in a variety of opinions about different products and services. Sentiment analysis is an important factor for better decision-making that is done using natural language processing (NLP), computational methods, and text analysis to extract the polarity of unstructured documents. The complexity of human languages and sentiment analysis have created a challenging research context in computer science and computational linguistics. Many researchers used supervised machine learning algorithms such as Naïve Bayes (NB), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Logistic Regression (LR) Random Forest (RF), and deep learning algorithms such as Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM). Some researchers have used Dictionary-based methods. Despite the existence of effective techniques in text mining, there are still unresolved challenges. Note that user comments are unstructured texts; Therefore, in order to structure the textual inputs, parsing is usually done along with adding some features, linguistic interpretations and removing additional items, and inserting the next terms in the database, then extracting the patterns in the structured data and finally the outputs will evaluate and interpret. The imbalance of data with the difference in the number of samples in each class of a dataset is an important challenge in the learning phase. This phenomenon breaks the performance of the classifications because the machine does not learn the features of the unpopulated classes well. In this paper, words are weighted based on the prescribed dictionary to influence the most important words on the result of the opinion mining by giving higher weight. On the other hand, the combination of the adjacent words using n-gram methods will improve the outcome. The dictionaries are highly related to the domain of the application. Some words in an application are important but in mobile comments are not impressive. Another challenge is the unbalanced train data, in which the number of positive sentences is not equal to the number of negative sentences. In this paper, two ideas are applied to build an efficient opinion mining algorithm. First, we build a precise dictionary for mobile Persian comments, and the second idea is to balance the positive and negative comments in train data. In summary, the main achievements of the current research can be mentioned: creating a weighted comprehensive dictionary in the field of mobile phone opinions to increase the accuracy of opinion analysis, balancing positive and negative opinions to improve the accuracy of opinion analysis, and eliminating the negative effect of overfitting and providing a precise approach to Determining the polarity of users' opinions about mobile phones using machine learning and recurrent deep learning algorithms. This new method is presented on mobile phone products from the Digikala site and Senti-Pers data. The result is performed with Naive Bayesian, Support Vector Machine, Stochastic Gradient Descent, Logistic Regression, Random Forest, and deep learning methods such as Convolutional Neural Network and Long Short-Term Memory based on parameters such as Accuracy, Precision, Retrieval, and F-Measure. The proposed method increases accuracy on Digikala, with NB between 10% and 34% and SVM between 5% and 24%, SGD between 7% and 38%, LR between 5% to 38%, and RF between 4% Up to 22% and CNN by 4%. The results show an accuracy increment on Senti-Pers, with NB between 12% and 46% and SVM between 5% and 46%, SGD between 5% and 35%, LR between 6% to 46%, and RF between 4% Up to 46%.
 
Article number: 5
Full-Text [PDF 1732 kb]   (79 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2020/12/31 | Accepted: 2023/12/11 | Published: 2024/04/25 | ePublished: 2024/04/25

References
1. [1] Feldman, Ronen. "Techniques and applications for sentiment analysis." Communications of the ACM 56.4 (2013): 82-89.‏ [DOI:10.1145/2436256.2436274]
2. [2] Gautam, Geetika, and Divakar Yadav. "Sentiment analysis of twitter data using machine learning approaches and semantic analysis." 2014 Seventh International Conference on Contemporary Computing (IC3). IEEE, 2014.‏ [DOI:10.1109/IC3.2014.6897213]
3. [3] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Communications of the ACM 60.6 (2017): 84-90.‏ [DOI:10.1145/3065386]
4. [4] Domingues, InĹes, et al. "Evaluation of oversampling data balancing techniques in the context of ordinal classification." 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 2018.‏ [DOI:10.1109/IJCNN.2018.8489599]
5. [5] Aung, Khin Zezawar, and Nyein Nyein Myo. "Sentiment analysis of students' comment using lexicon-based approach." 2017 IEEE/ACIS 16th international conference on computer and information science (ICIS). IEEE, 2017.‏ [DOI:10.1109/ICIS.2017.7959985]
6. [6] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. "Unsupervised learning." The elements of statistical learning. Springer, New York, NY, 2009. 485-585.‏ [DOI:10.1007/978-0-387-84858-7_14]
7. [7] Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up? Sentiment classification using machine learning techniques." arXiv preprint cs/0205070 (2002).‏ [DOI:10.3115/1118693.1118704]
8. [8] Salvetti, Franco, Stephen Lewis, and Christoph Reichenbach. "Automatic opinion polarity classification of movie reviews." Colorado research in linguistics 17 (2004).‏
9. [9] Beineke, Philip, Trevor Hastie, and Shivakumar Vaithyanathan. "The sentimental factor: Improving review classification via human-provided information." Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04). 2004.‏ [DOI:10.3115/1218955.1218989]
10. [10] Mullen, Tony, and Nigel Collier. "Sentiment analysis using support vector machines with diverse information sources." Proceedings of the 2004 conference on empirical methods in natural language processing. 2004.‏
11. [11] Dave, Kushal, Steve Lawrence, and David M. Pennock. "Mining the peanut gallery: Opinion extraction and semantic classification of product reviews." Proceedings of the 12th international conference on World Wide Web. 2003.‏ [DOI:10.1145/775152.775226]
12. [12] Matsumoto, Shotaro, Hiroya Takamura, and Manabu Okumura. "Sentiment classification using word sub-sequences and dependency sub-trees." Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, 2005.‏ [DOI:10.1007/11430919_37]
13. [13] Zhang, Dongwen, et al. "Chinese comments sentiment classification based on word2vec and SVMperf." Expert Systems with Applications 42.4 (2015): 1857-1863.‏ [DOI:10.1016/j.eswa.2014.09.011]
14. [14] Liu, Shuhua Monica, and Jiun-Hung Chen. "A multi-label Classification based approach for sentiment classification." Expert Systems with Applications 42.3 (2015): 1083-1093.‏ [DOI:10.1016/j.eswa.2014.08.036]
15. [15] Luo, Banghui, Jianping Zeng, and Jiangjiao Duan. "Emotion space model for classifying opinions in stock message board." Expert Systems with Applications 44 (2016): 138-146.‏ [DOI:10.1016/j.eswa.2015.08.023]
16. [16] Niu, Teng, et al. "Sentiment analysis on multi-view social data." International Conference on Multimedia Modeling. Springer, Cham, 2016.‏ [DOI:10.1007/978-3-319-27674-8_2]
17. [17] Li, Caiqiang, and Junming Ma. "Research on online education teacher evaluation model based on opinion mining." 2012 National Conference on Information Technology and Computer Science. Atlantis Press, 2012.‏ [DOI:10.2991/citcs.2012.264]
18. [18] Ortigosa, Alvaro, José M. Martín, and Rosa M. Carro. "Sentiment analysis in Facebook and its application to e-learning." Computers in human behavior 31 (2014): 527-541.‏ [DOI:10.1016/j.chb.2013.05.024]
19. [19] Pong-Inwong, Chakrit, and Wararat Songpan Rungworawut. "Teaching senti-lexicon for automated sentiment polarity definition in teaching evaluation." 2014 10th International Conference on Semantics, Knowledge and Grids. IEEE, 2014.‏ [DOI:10.1109/SKG.2014.25]
20. [20] Wang, Yili, and Hee Yong Youn. "Feature Weighting Based on Inter-Category and Intra-Category Strength for Twitter Sentiment Analysis." Applied Sciences 9.1 (2019): 92.‏ [DOI:10.3390/app9010092]
21. [21] Mnsefi, Gharizadeh, Sefidsangi, "An overview of opinion mining", The first specialized conference on intelligent computer systems and their applications, Tehran, 2011.
22. [22] Noeei, Jalali, Ghaemi, "Opinion mining: An overview of the work done", National Conference on Application of Intelligent Systems (soft computing) in Science and Technology, Quchan, 2013.
23. [23] Baccianella, Stefano, Andrea Esuli, and Fabrizio Sebastiani. "Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining." Lrec. Vol. 10. No. 2010. 2010.‏
24. [24] Ngoc, Phan Trong, and Myungsik Yoo. "The lexicon-based sentiment analysis for fan page ranking in Facebook." The International Conference on Information Networking 2014 (ICOIN2014). IEEE, 2014.‏ [DOI:10.1109/ICOIN.2014.6799721]
25. [25] Tang, Huifeng, Songbo Tan, and Xueqi Cheng. "A survey on sentiment detection of reviews." Expert Systems with Applications 36.7 (2009): 10760-10773.‏ [DOI:10.1016/j.eswa.2009.02.063]
26. [26] Garreta, Raul, and Guillermo Moncecchi. Learning scikit-learn: machine learning in python. Packt Publishing Ltd, 2013.‏
27. [27] McCallum, Andrew, and Kamal Nigam. "A comparison of event models for naive bayes text classification." AAAI-98 workshop on learning for text categorization. Vol. 752. No. 1. 1998.‏
28. [28] Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin. "A practical guide to support vector classification." (2003): 1396-1400.‏
29. [29] Bottou, Léon. "Stochastic gradient descent tricks." Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012. 421-436.‏ [DOI:10.1007/978-3-642-35289-8_25]
30. [30] Walker, Strother H., and David B. Duncan. "Estimation of the probability of an event as a function of several independent variables." Biometrika 54.1-2 (1967): 167-179.‏ [DOI:10.1093/biomet/54.1-2.167] [PMID]
31. [31] Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.‏ [DOI:10.1023/A:1010933404324]
32. [32] Yuan, Yufei, and Michael J. Shaw. "Induction of fuzzy decision trees." Fuzzy Sets and systems 69.2 (1995): 125-139.‏ [DOI:10.1016/0165-0114(94)00229-Z]
33. [33] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.‏ [DOI:10.1109/5.726791]
34. [34] Hochreiter, Sepp. "JA1 4 rgen Schmidhuber (1997). "Long Short-Term Memory"." Neural Computation 9.8.‏ [DOI:10.1162/neco.1997.9.8.1735] [PMID]
35. [35] Tripathy, Abinash, Ankit Agrawal, and Santanu Kumar Rath. "Classification of sentiment reviews using n-gram machine learning approach." Expert Systems with Applications 57 (2016): 117-126.‏ [DOI:10.1016/j.eswa.2016.03.028]
36. [36] Mouthami, K., K. Nirmala Devi, and V. Murali Bhaskaran. "Sentiment analysis and classification based on textual reviews." 2013 international conference on Information communication and embedded systems (ICICES). IEEE, 2013.‏ [DOI:10.1109/ICICES.2013.6508366]
37. [37] Zhang, H. "The Optimality of Naive Bayes," In Proc. Seventeenth Int. Florida Artif. Intell. Res. Soc. Conf. FLAIRS 2004, vol. 1, no. 2, pp. 1-6. 2004.
38. [38] Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006.‏
39. [39] Bottou, Léon. "Large-scale machine learning with stochastic gradient descent." Proceedings of COMPSTAT'2010. Physica-Verlag HD, 2010. 177-186.‏ [DOI:10.1007/978-3-7908-2604-3_16]
40. [40] Menard, Scott. Applied logistic regression analysis. Vol. 106. Sage, 2002.‏ [DOI:10.4135/9781412983433]
41. [41] Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.‏ [DOI:10.1023/A:1010933404324]
42. [42] Ketkar, Nikhil, and Eder Santana. Deep Learning with Python. Vol. 1. Berkeley, CA: Apress, 2017.‏ [DOI:10.1007/978-1-4842-2766-4_1]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing