ارائه روشی جدید برای تعبیه اسناد جهت دسته‌بندی متون خبری

رحیمی, زهرا; همایونپور, محمدمهدی

doi:10.61186/jsdp.19.4.137

دوره 19، شماره 4 - ( 12-1401 ) جلد 19 شماره 4 صفحات 148-137 | برگشت به فهرست نسخه ها

‎ 10.61186/jsdp.19.4.137

Mendeley

Zotero

RefWorks

Rahimi Z, Homayounpour M M. A New Document Embedding Method for News Classification. JSDP 2023; 19 (4) : 10
URL: http://jsdp.rcisp.ac.ir/article-1-1159-fa.html

رحیمی زهرا، همایونپور محمدمهدی. ارائه روشی جدید برای تعبیه اسناد جهت دسته‌بندی متون خبری. پردازش علائم و داده‌ها. 1401; 19 (4) :137-148

URL: http://jsdp.rcisp.ac.ir/article-1-1159-fa.html

ارائه روشی جدید برای تعبیه اسناد جهت دسته‌بندی متون خبری

زهرا رحیمی

، محمدمهدی همایونپور^*

دانشگاه صنعتی امیرکبیر

چکیده: (1363 مشاهده)

یکی از کاربردهای مهم در پردازش زبان طبیعی، دسته‌بندی متون است. برای دستهبندی متون خبری باید ابتدا آنها را به شیوه مناسبی بازنمایی کرد. روشهای مختلفی برای بازنمایی متن وجود دارد ولی بیشتر آنها روشهایی همه منظوره هستند و فقط از اطلاعات هم‌رخدادی محلی و مرتبه اول کلمات برای بازنمایی استفاده مینمایند. در این مقاله روشی بیناظر برای بازنمایی متون خبری ارائه شده است که از اطلاعات هم‌رخدادی سراسری و اطلاعات موضوعی برای بازنمایی اسناد استفاده مینماید. اطلاعات موضوعی علاوه بر اینکه بازنمایی انتزاعیتری از متن ارائه میدهد حاوی اطلاعات هم‌رخدادیهای مراتب بالاتر نیز هست. اطلاعات هم‌رخدادی سراسری و موضوعی مکمل یکدیگرند. بنابراین در این مقاله به‌منظور تولید بازنمایی غنیتری برای دستهبندی متن، هر دو بکارگرفته شدهاند. روش پیشنهادی بر روی پیکرههای R8 و 20-Newsgruops که از پیکرههای شناختهشده برای دستهبندی متون هستند آزمایش شده و با روشهای مختلفی مقایسه گردید. در مقایسه با روش پیشنهادی با سایر روش‌ها افزایش دقتی به میزان افزایش 3% مشاهده گردید.

شماره‌ی مقاله: 10

واژه‌های کلیدی: بازنمایی سند، تعبیه سند، تعبیه کلمه، همرخدادی کلمات، اطلاعات موضوعی، دسته‌بندی متن

متن کامل [PDF 577 kb] (540 دریافت)

نوع مطالعه: كاربردي | موضوع مقاله: مقالات پردازش متن
دریافت: 1399/5/11 | پذیرش: 1399/12/18 | انتشار: 1401/12/29 | انتشار الکترونیک: 1401/12/29

فهرست منابع

1. [1] M. Fu, H. Qu, L. Huang, and L. Lu, "Bag of meta-words: A novel method to represent document for the sentiment classification, " Expert Syst. Appl., vol. 113, pp. 33-43, 2018. [DOI:10.1016/j.eswa.2018.06.052]

2. [2] R. Zhao and K. Mao, "Fuzzy Bag-of-Words Model for Document Representation, " IEEE Trans. Fuzzy Syst., vol. 26, no. 2, pp. 794-804, 2018. [DOI:10.1109/TFUZZ.2017.2690222]

3. [3] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. 1987.

4. [4] M. A. M. Garcia, R. P. Rodriguez, M. V. Ferro, and L. A. Rifon, "Wikipedia-Based Hybrid Document Representation for Textual News Classification," 2016 3rd Int. Conf. Soft Comput. Mach. Intell., no. November, pp. 148-153, 2016.

5. [5] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching Word Vectors with Subword Information, " Trans. Assoc. Comput. Linguist., vol. 5, pp. 135-146, 2016. [DOI:10.1162/tacl_a_00051]

6. [6] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space, " in International estimation on learning representations: Workshop Track, 2013, pp. 1-12.

7. [7] R. Collobert and J. Weston, "A unified architecture for natural language processing, " pp. 160-167, 2008. [DOI:10.1145/1390156.1390177]

8. [8] P. Li, K. Mao, Y. Xu, Q. Li, and J. Zhang, "Bag-of-Concepts representation for document classification based on automatic knowledge acquisition from probabilistic knowledge base, " Knowledge-Based Syst., vol. 193, no. xxxx, 2020. [DOI:10.1016/j.knosys.2019.105436]

9. [9] H. K. Kim, H. Kim, and S. Cho, "Bag-of-concepts: Comprehending document representation through clustering words in distributed representation, " Neurocomputing, vol. 266, pp. 336-352, 2017. [DOI:10.1016/j.neucom.2017.05.046]

10. [10] M. Kamkarhaghighi and M. Makrehchi, "Content Tree Word Embedding for document representation, " Expert Syst. Appl., vol. 90, pp. 241-249, 2017. [DOI:10.1016/j.eswa.2017.08.021]

11. [11] R. A. Sinoara, J. Camacho-Collados, R. G. Rossi, R. Navigli, and S. O. Rezende, "Knowledge-enhanced document embeddings for text classification, " Knowledge-Based Syst., vol. 163, pp. 955-971, 2019. [DOI:10.1016/j.knosys.2018.10.026]

12. [12] J. Camacho-Collados and M. T. Pilehvar, "From word to sense embeddings: A survey on vector representations of meaning, " J. Artif. Intell. Res., vol. 63, pp. 743-788, 2018. [DOI:10.1613/jair.1.11259]

13. [13] D. Tang, F. Wei, B. Qin, N. Yang, T. Liu, and M. Zhou, "Sentiment Embeddings with Applications to Sentiment Analysis, " IEEE Trans. Knowl. Data Eng., vol. 28, no. 2, pp. 496-509, 2016. [DOI:10.1109/TKDE.2015.2489653]

14. [14] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, "Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification, " in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Long Papers), 2014, vol. 1, pp. 1555-1565. [DOI:10.3115/v1/P14-1146] [PMID] []

15. [15] Q. Le and T. Mikolov, "Distributed representations of sentences and documents, " in International conference on machine learning, 2014, vol. 32, pp. 1188-1196.

16. [16] Y. Kim, "Convolutional Neural Networks for Sentence Classification," 2014. [DOI:10.3115/v1/D14-1181]

17. [17] G. Rao, W. Huang, Z. Feng, and Q. Cong, "LSTM with sentence representations for document-level sentiment classification, " Neurocomputing, vol. 308, no. May, pp. 49-57, 2018. [DOI:10.1016/j.neucom.2018.04.045]

18. [18] W. Etaiwi and A. Awajan, "Graph-based Arabic text semantic representation, " Inf. Process. Manag., vol. 57, no. 3, p. 102183, 2020. [DOI:10.1016/j.ipm.2019.102183]

19. [19] L. Yao, C. Mao, and Y. Luo, "graph convolutional networks for text classification," 2018.

20. [20] K. Bijari, H. Zare, E. Kebriaei, and H. Veisi, "Leveraging deep graph-based text representation for sentiment polarity applications, " Expert Syst. Appl., vol. 144, 2020. [DOI:10.1016/j.eswa.2019.113090]

21. [21] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng, "Improve Word Representation via Global Context and Multiple Word Prototypes, " in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 2012, no. July, pp. 873-882.

22. [22] J. Pennington, R. Socher, and C. Manning, "'Glove: Global Vectors for Word Representation, '" in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543. [DOI:10.3115/v1/D14-1162]

23. [23] C. Chemudugunta, P. Smyth, and M. Steyvers, "Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model, " Adv. Neural Inf. Process. Syst., pp. 241-248, 2007. [DOI:10.7551/mitpress/7503.003.0035]

24. [24] T. G. Kolda and B. W. Bader, "Tensor Decompositions and Applications, " SIAM Rev., vol. 51, no. 3, pp. 455-500, 2009. [DOI:10.1137/07070111X]

25. [25] Z. Rahimi and M. M. Homayounpour, "Tens-embedding: A Tensor-based document embedding method, " Expert Syst. Appl., vol. 162, p. 113770, 2020. [DOI:10.1016/j.eswa.2020.113770]

26. [26] R. Lakshmi and S. Baskar, "Novel term weighting schemes for document representation based on ranking of terms and Fuzzy logic with semantic relationship of terms, " Expert Syst. Appl., vol. 137, pp. 493-503, 2019. [DOI:10.1016/j.eswa.2019.07.022]

27. [27] S. Deerwester, S. T. Dumias, G. W.Furmas, T. K.Lander, and R. Harshman, "Indexing by Latent Semantic Analysis," J. Am. Soc. Inf. Sci., vol. 41, no. 6, pp. 391-407, 1990. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 [DOI:10.1002/(SICI)1097-4571(199009)41:63.0.CO;2-9]

28. [28] T. Hofmann, "probabilistic latent semantic analysis, " in Hofmann, Thomas. "Probabilistic latent semantic analysis." Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, 1999, pp. 289-296. [DOI:10.1145/312624.312649]

29. [29] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation, " J. Mach. Learn. Res., vol. 3, pp. 993-1022, 2003.

30. [30] B. Jiang, Z. Li, H. Chen, S. Member, and A. G. Cohn, "Latent Topic Text Representation Learning on Statistical Manifolds, " IEEE Trans. Neural Networks Learn. Syst. 29, pp. 5643-5654, 2018. [DOI:10.1109/TNNLS.2018.2808332] [PMID]

31. [31] R. Das, M. Zaheer, and C. Dyer, "Gaussian LDA for Topic Models with Word Embeddings, " Proc. 53rd Annu. Meet. Assoc. Comput. Linguist. 7th Int. Jt. Conf. Nat. Lang. Process., pp. 795-804, 2015. [DOI:10.3115/v1/P15-1077] [PMID] []

32. [32] P. Liu, X. Qiu, and X. Huang, "Recurrent Neural Network for Text Classification with Multi-Task Learning, " Proc. Twenty-Fifth Int. Jt. Conf. Artif. Intelligen, pp. 2873-2879, 2016.

33. [33] T. N.Kipf and M. Welling, "Semi-Supervised classification with Graph Convolusional Networks," Iclr, pp. 1-11, 2017.

34. [34] C. Wu, F. Wu, T. Qi, X. Cui, and Y. Huang, "Attentive Pooling with Learnable Norms for Text Representation, " in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 2961-2970. [DOI:10.18653/v1/2020.acl-main.267] [PMID] []

35. [35] Í. C. Dourado, R. Galante, M. A. Gonçalves, and R. da Silva Torres, "Bag of textual graphs (BoTG): A general graph-based text representation model, " J. Assoc. Inf. Sci. Technol., no. April, 2019. [DOI:10.1002/asi.24167]

36. [36] K. Huang, N. D. Sidiropoulos, and A. P. Liavas, "A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization, " IEEE Trans. Signal Process., vol. 64, no. 19, pp. 5052-5065, 2016. [DOI:10.1109/TSP.2016.2576427]

37. [37] S. Smith, J. Park, and G. Karypis, "SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory, " no. May, 2015. [DOI:10.1109/IPDPS.2015.27]

38. [38] Y. Liu, Z. Liu, T. Chua, and M. Sun, "Topical Word Embedding, " in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Topical, 2015, pp. 2418-2424. [DOI:10.1609/aaai.v29i1.9522]

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.

نظر شما در مورد قالب جدید چیست؟
	خوب
	متوسط
	ضعیف

پایگاه‌های مرتبط

واژگان کلیدی

نظرسنجی