دوره 19، شماره 4 - ( 12-1401 )                   جلد 19 شماره 4 صفحات 148-137 | برگشت به فهرست نسخه ها


XML English Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Rahimi Z, Homayounpour M M. A New Document Embedding Method for News Classification. JSDP 2023; 19 (4) : 10
URL: http://jsdp.rcisp.ac.ir/article-1-1159-fa.html
رحیمی زهرا، همایونپور محمدمهدی. ارائه روشی جدید برای تعبیه اسناد جهت دسته‌بندی متون خبری. پردازش علائم و داده‌ها. 1401; 19 (4) :137-148

URL: http://jsdp.rcisp.ac.ir/article-1-1159-fa.html


دانشگاه صنعتی امیرکبیر
چکیده:   (1363 مشاهده)
یکی از کاربردهای مهم در پردازش زبان طبیعی، دستهبندی متون است. برای دسته­بندی متون خبری باید ابتدا آنها را به شیوه مناسبی بازنمایی کرد. روش­های مختلفی برای بازنمایی متن وجود دارد ولی بیشتر آنها روش­هایی همه منظوره هستند و  فقط از اطلاعات هم‌رخدادی محلی و مرتبه اول کلمات برای بازنمایی استفاده می­نمایند. در این مقاله روشی  بی­ناظر برای بازنمایی متون خبری ارائه شده است که از اطلاعات هم‌رخدادی سراسری و اطلاعات موضوعی  برای بازنمایی اسناد استفاده می­نماید. اطلاعات موضوعی علاوه بر اینکه بازنمایی انتزاعی­تری از متن ارائه می­دهد حاوی اطلاعات هم‌رخدادی­های مراتب بالاتر نیز هست. اطلاعات هم‌رخدادی سراسری و موضوعی مکمل یکدیگرند. بنابراین در این مقاله به‌منظور تولید بازنمایی غنی­تری برای دسته­بندی متن، هر دو بکارگرفته شده­اند. روش پیشنهادی بر روی پیکره­های R8  و 20-Newsgruops که از پیکره­های شناخته­شده برای دسته­بندی متون هستند آزمایش شده و با روش­های مختلفی مقایسه گردید. در مقایسه با روش پیشنهادی با سایر روش‌ها افزایش دقتی به میزان افزایش 3%  مشاهده گردید.
شماره‌ی مقاله: 10
متن کامل [PDF 577 kb]   (540 دریافت)    
نوع مطالعه: كاربردي | موضوع مقاله: مقالات پردازش متن
دریافت: 1399/5/11 | پذیرش: 1399/12/18 | انتشار: 1401/12/29 | انتشار الکترونیک: 1401/12/29

فهرست منابع
1. [1] M. Fu, H. Qu, L. Huang, and L. Lu, "Bag of meta-words: A novel method to represent document for the sentiment classification, " Expert Syst. Appl., vol. 113, pp. 33-43, 2018. [DOI:10.1016/j.eswa.2018.06.052]
2. [2] R. Zhao and K. Mao, "Fuzzy Bag-of-Words Model for Document Representation, " IEEE Trans. Fuzzy Syst., vol. 26, no. 2, pp. 794-804, 2018. [DOI:10.1109/TFUZZ.2017.2690222]
3. [3] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. 1987.
4. [4] M. A. M. Garcia, R. P. Rodriguez, M. V. Ferro, and L. A. Rifon, "Wikipedia-Based Hybrid Document Representation for Textual News Classification," 2016 3rd Int. Conf. Soft Comput. Mach. Intell., no. November, pp. 148-153, 2016.
5. [5] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching Word Vectors with Subword Information, " Trans. Assoc. Comput. Linguist., vol. 5, pp. 135-146, 2016. [DOI:10.1162/tacl_a_00051]
6. [6] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space, " in International estimation on learning representations: Workshop Track, 2013, pp. 1-12.
7. [7] R. Collobert and J. Weston, "A unified architecture for natural language processing, " pp. 160-167, 2008. [DOI:10.1145/1390156.1390177]
8. [8] P. Li, K. Mao, Y. Xu, Q. Li, and J. Zhang, "Bag-of-Concepts representation for document classification based on automatic knowledge acquisition from probabilistic knowledge base, " Knowledge-Based Syst., vol. 193, no. xxxx, 2020. [DOI:10.1016/j.knosys.2019.105436]
9. [9] H. K. Kim, H. Kim, and S. Cho, "Bag-of-concepts: Comprehending document representation through clustering words in distributed representation, " Neurocomputing, vol. 266, pp. 336-352, 2017. [DOI:10.1016/j.neucom.2017.05.046]
10. [10] M. Kamkarhaghighi and M. Makrehchi, "Content Tree Word Embedding for document representation, " Expert Syst. Appl., vol. 90, pp. 241-249, 2017. [DOI:10.1016/j.eswa.2017.08.021]
11. [11] R. A. Sinoara, J. Camacho-Collados, R. G. Rossi, R. Navigli, and S. O. Rezende, "Knowledge-enhanced document embeddings for text classification, " Knowledge-Based Syst., vol. 163, pp. 955-971, 2019. [DOI:10.1016/j.knosys.2018.10.026]
12. [12] J. Camacho-Collados and M. T. Pilehvar, "From word to sense embeddings: A survey on vector representations of meaning, " J. Artif. Intell. Res., vol. 63, pp. 743-788, 2018. [DOI:10.1613/jair.1.11259]
13. [13] D. Tang, F. Wei, B. Qin, N. Yang, T. Liu, and M. Zhou, "Sentiment Embeddings with Applications to Sentiment Analysis, " IEEE Trans. Knowl. Data Eng., vol. 28, no. 2, pp. 496-509, 2016. [DOI:10.1109/TKDE.2015.2489653]
14. [14] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, "Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification, " in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Long Papers), 2014, vol. 1, pp. 1555-1565. [DOI:10.3115/v1/P14-1146] [PMID] []
15. [15] Q. Le and T. Mikolov, "Distributed representations of sentences and documents, " in International conference on machine learning, 2014, vol. 32, pp. 1188-1196.
16. [16] Y. Kim, "Convolutional Neural Networks for Sentence Classification," 2014. [DOI:10.3115/v1/D14-1181]
17. [17] G. Rao, W. Huang, Z. Feng, and Q. Cong, "LSTM with sentence representations for document-level sentiment classification, " Neurocomputing, vol. 308, no. May, pp. 49-57, 2018. [DOI:10.1016/j.neucom.2018.04.045]
18. [18] W. Etaiwi and A. Awajan, "Graph-based Arabic text semantic representation, " Inf. Process. Manag., vol. 57, no. 3, p. 102183, 2020. [DOI:10.1016/j.ipm.2019.102183]
19. [19] L. Yao, C. Mao, and Y. Luo, "graph convolutional networks for text classification," 2018.
20. [20] K. Bijari, H. Zare, E. Kebriaei, and H. Veisi, "Leveraging deep graph-based text representation for sentiment polarity applications, " Expert Syst. Appl., vol. 144, 2020. [DOI:10.1016/j.eswa.2019.113090]
21. [21] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng, "Improve Word Representation via Global Context and Multiple Word Prototypes, " in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 2012, no. July, pp. 873-882.
22. [22] J. Pennington, R. Socher, and C. Manning, "'Glove: Global Vectors for Word Representation, '" in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543. [DOI:10.3115/v1/D14-1162]
23. [23] C. Chemudugunta, P. Smyth, and M. Steyvers, "Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model, " Adv. Neural Inf. Process. Syst., pp. 241-248, 2007. [DOI:10.7551/mitpress/7503.003.0035]
24. [24] T. G. Kolda and B. W. Bader, "Tensor Decompositions and Applications, " SIAM Rev., vol. 51, no. 3, pp. 455-500, 2009. [DOI:10.1137/07070111X]
25. [25] Z. Rahimi and M. M. Homayounpour, "Tens-embedding: A Tensor-based document embedding method, " Expert Syst. Appl., vol. 162, p. 113770, 2020. [DOI:10.1016/j.eswa.2020.113770]
26. [26] R. Lakshmi and S. Baskar, "Novel term weighting schemes for document representation based on ranking of terms and Fuzzy logic with semantic relationship of terms, " Expert Syst. Appl., vol. 137, pp. 493-503, 2019. [DOI:10.1016/j.eswa.2019.07.022]
27. [27] S. Deerwester, S. T. Dumias, G. W.Furmas, T. K.Lander, and R. Harshman, "Indexing by Latent Semantic Analysis," J. Am. Soc. Inf. Sci., vol. 41, no. 6, pp. 391-407, 1990. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 [DOI:10.1002/(SICI)1097-4571(199009)41:63.0.CO;2-9]
28. [28] T. Hofmann, "probabilistic latent semantic analysis, " in Hofmann, Thomas. "Probabilistic latent semantic analysis." Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, 1999, pp. 289-296. [DOI:10.1145/312624.312649]
29. [29] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation, " J. Mach. Learn. Res., vol. 3, pp. 993-1022, 2003.
30. [30] B. Jiang, Z. Li, H. Chen, S. Member, and A. G. Cohn, "Latent Topic Text Representation Learning on Statistical Manifolds, " IEEE Trans. Neural Networks Learn. Syst. 29, pp. 5643-5654, 2018. [DOI:10.1109/TNNLS.2018.2808332] [PMID]
31. [31] R. Das, M. Zaheer, and C. Dyer, "Gaussian LDA for Topic Models with Word Embeddings, " Proc. 53rd Annu. Meet. Assoc. Comput. Linguist. 7th Int. Jt. Conf. Nat. Lang. Process., pp. 795-804, 2015. [DOI:10.3115/v1/P15-1077] [PMID] []
32. [32] P. Liu, X. Qiu, and X. Huang, "Recurrent Neural Network for Text Classification with Multi-Task Learning, " Proc. Twenty-Fifth Int. Jt. Conf. Artif. Intelligen, pp. 2873-2879, 2016.
33. [33] T. N.Kipf and M. Welling, "Semi-Supervised classification with Graph Convolusional Networks," Iclr, pp. 1-11, 2017.
34. [34] C. Wu, F. Wu, T. Qi, X. Cui, and Y. Huang, "Attentive Pooling with Learnable Norms for Text Representation, " in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 2961-2970. [DOI:10.18653/v1/2020.acl-main.267] [PMID] []
35. [35] Í. C. Dourado, R. Galante, M. A. Gonçalves, and R. da Silva Torres, "Bag of textual graphs (BoTG): A general graph-based text representation model, " J. Assoc. Inf. Sci. Technol., no. April, 2019. [DOI:10.1002/asi.24167]
36. [36] K. Huang, N. D. Sidiropoulos, and A. P. Liavas, "A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization, " IEEE Trans. Signal Process., vol. 64, no. 19, pp. 5052-5065, 2016. [DOI:10.1109/TSP.2016.2576427]
37. [37] S. Smith, J. Park, and G. Karypis, "SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory, " no. May, 2015. [DOI:10.1109/IPDPS.2015.27]
38. [38] Y. Liu, Z. Liu, T. Chua, and M. Sun, "Topical Word Embedding, " in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Topical, 2015, pp. 2418-2424. [DOI:10.1609/aaai.v29i1.9522]

ارسال نظر درباره این مقاله : نام کاربری یا پست الکترونیک شما:
CAPTCHA

ارسال پیام به نویسنده مسئول


بازنشر اطلاعات
Creative Commons License این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.