بررسی روش‌های مؤثر بر عملکرد تجزیه‌گر دستور مستقل از متن آماری زبان فارسی

صادق زاده, محمدباقر; رزازی, محمدرضا; قیومی, مسعود

doi:10.29252/jsdp.16.3.36

***************«بسم الله الرحمن الرحیم» نشریه علمی «پردازش علائم و داده‌ها» با مجوز رسمی از کمیسیون نشریات وزارت علوم، تحقیقات و فناوری، صاحب امتیاز: پژوهشگاه توسعه فناوری‌های پیشرفته ***************

Signal and Data Processing Journal A scientific journal officially licensed by the Commission for Scientific Publications of the (MSRT). Publisher: Research Ceter for Developmen of Technologies

EN FA

دوره 16، شماره 3 - ( 10-1398 ) جلد 16 شماره 3 صفحات 23-36 | برگشت به فهرست نسخه ها

‎ 10.29252/jsdp.16.3.36

Mendeley

Zotero

RefWorks

sadeghzadeh M, razzazi M, ghayoomi M. Studying impressive parameters on the performance of Persian probabilistic context free grammar parser. JSDP 2019; 16 (3) :36-23
URL: http://jsdp.rcisp.ac.ir/article-1-385-fa.html

صادق زاده محمدباقر، رزازی محمدرضا، قیومی مسعود. بررسی روش‌های مؤثر بر عملکرد تجزیه‌گر دستور مستقل از متن آماری زبان فارسی. پردازش علائم و داده‌ها. 1398; 16 (3) :36-23

URL: http://jsdp.rcisp.ac.ir/article-1-385-fa.html

بررسی روش‌های مؤثر بر عملکرد تجزیه‌گر دستور مستقل از متن آماری زبان فارسی

محمدباقر صادق زاده^*

، محمدرضا رزازی

، مسعود قیومی

دانشگاه صنعتی امیرکبیر

چکیده: (4811 مشاهده)

عدم دقّت در طراحی دستورهای مستقل از متن و استفاده از ساختارهای نامناسب مانند فرم نرمال چامسکی به خودی خود می‌تواند عملکرد تجزیه‌‍گرهای آماری مستقل از متن را تضعیف کند. در این پژوهش ساختار ترکیبات عطفی درخت‌بانک فارسی را مورد بررسی قرار دادیم. نتایج حاصل از این پژوهش نشان می‌دهد که با اضافه‌کردن وابستگی‌های ساختاری به دستورهای مستقل از متن و اصلاح قواعد اولیه، می‌‌توان از ترکیبات عطفی رفع ابهام کرد و صحت عملکرد تجزیه‌گر دستور مستقل از متن آماری را افزایش داد. فرض استقلال ضعیف، یکی از مشکلات مربوط به دستورهای مستقل از متن است که سعی شده است تا با تزریق وابستگی‌های ساختاری از طریق نشانه‌گذاری گره‌های والد و فرزند مرتفع شود. تأثیر ریزدانگی و درشتدانگی برچسب‌های اجزای واژگانی کلام و همین‌طور ادغام ناپایانه‌ها بر تجزیه‌گر دستور مستقل از متن آماری فارسی از جمله مواردِ مورد بررسی قرار گرفته‌شده در این پژوهش است.

واژه‌های کلیدی: دستور مستقل از متن آماری، تجزیه‌گر، ترکیبات عطفی، نشانه‌گذاری قواعد، برچسب اجزای واژگانی کلام

متن کامل [PDF 3868 kb] (1643 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: مقالات پردازش متن
دریافت: 1397/2/5 | پذیرش: 1398/4/19 | انتشار: 1398/10/17 | انتشار الکترونیک: 1398/10/17

فهرست منابع

1. [1] J. E. Hopcroft, R. Motwani, and J. D. Ullman, "Automata theory, languages, and computation," International Edition, vol. 24, 2006.

2. [2] N. Chomsky, Syntactic structures. Walter de Gruy-ter, 2002.

3. [3] E. Charniak and M. Johnson, "Coarse-to-fine n-best parsing and MaxEnt discriminative reranking," in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics: Association for Computational Linguistics, pp. 173-180. 2005.

4. [4] S. Green and C. D. Manning, "Better Arabic parsing: Baselines, evaluations, and analysis," in Proceedings of the 23rd International Conference on Computational Linguistics: Association for Computational Linguistics, pp. 394-402, 2010.

5. [5] M. Ghayoomi, "From Grammar Rule Extraction to Treebanking: A Bootstrapping Approach," in LREC, 2012, pp. 1912-1919.

6. [6] م. رزازی, "پژوهش مستقل دانشگاه صنعتی امیرکبیر," 1385.

7. [6] M. Razzazi, "Independent research at Amirkabir University Of Technology", 2006.

8. [7] ا. استیری, م. کاهانی, ر. سعیدی و ا. عسگریان, "طراحی ابزار پارسر زبان فارسی," کنفرانس بین المللی پردازش خط و زبان فارسی, 1391.

9. [7] A. Astiri, M. Kahani, R. Saeidi ,and A. Asgariyan, "Designing a parser for persian language",Inter-national Conference on persian language pro-cessing, 2012.

10. [8] H. Feili ,and G. Ghassem-Sani, "Unsupervised grammar induction using history based approach," Computer Speech & Language, vol. 20, no. 4, pp. 644-658, 2006.

11. [9] K. Lari and S. J. Young, "The estimation of stochastic context-free grammars using the inside-outside algorithm," Computer Speech & Language, vol. 4, no. 1, pp. 35-56, 1990.

12. [10] ش. ع. پور, م. ه. پور و م. ب. ج. خان, "شناسایی محل کسره اضافه در زبان فارسی با استفاده از گرامر مستقل از متن احتمالاتی"، ارائه شده در سیزدهمین کنفرانس سالانه انجمن کامپیوتر ایران 1386.

13. [10] Sh.A. Poor, M.H. Poor ,and M. BijanKhan, "Identifying the location of the excess in Persian using PCFG", In Procedings of 13th Conference of Computer Society of Iran, 2008.

14. [11] م. قیومی, "معرفی دادگان درختی و تجزیه‌گر خودکار فارسی," ارائه شده در هشتمین همایش زبان‌شناسی ایران, تهران، دانشگاه علامه‌طباطبایی, ۱۳۹۲.

15. [11] M. Ghayoomi, "Persian Treebank and Autoa-mtion Parser", In Procedings of Computational Lingustic of Iran, 2013.

16. [12] M. Ghayoomi, "Word clustering for Persian statistical parsing," in Advances in Natural Language Processing: Springer, 2012, pp. 126-137.

17. [13] P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, "Class-based n-gram models of natural language," Computational linguistics, vol. 18, no. 4, pp. 467-479, 1992.

18. [14] D. Jurafsky and J. H. Martin, Speech and Language Processing. Prentice Hall, Pearson Education International, 2014.

19. [15] T. L. Booth, "Probabilistic representation of formal languages," in Switching and Automata Theory, 1969., IEEE Conference Record of 10th Annual Symposium on, pp. 74-81, 1969.

20. [16] C. D. Manning and H. Schütze, Foundations of statistical natural language processing. MIT press, 1999.

21. [17] A. Bies et al., "Bracketing guidelines for treebank II style Penn Treebank project. Philadelphia: Linguistic Data Consortium," ed, 2013.

22. [18] C. Pollard, Head-driven phrase structure grammar, University of Chicago Press, 1994.

23. [19] م. صادق‌زاده, م. رزازی و م. قیومی, "بررسی عوامل مؤثر بر عملکرد تجزیه گر آمارسی"، ارائه شده در سومین همایش زبانشناسی رایانشی, تهران, 1393.

24. [19] M. Sadeghzadeh, M.Razzazi and M. Ghayoomi, "Investigating effective factors on Persian Parser", In Proceedings of the 3th Conference on Computatinal Lingustics, Tehran, 2013.

25. [20] D. Klein and C. D. Manning, "A parsing: fast exact Viterbi parse selection," in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Association for Computational Linguistics,vol.1, pp. 40-47, 2003.

26. [21] S. Bird, "NLTK: the natural language toolkit," in Proceedings of the COLING/ACL on Interactive presentation sessions: Association for Computational Linguistics, pp. 69-72, 2006.

27. [22] S. Abney and et al., "Procedure for quantitatively comparing the syntactic coverage of English grammars," in Proceedings of the workshop on Speech and Natural Language: Association for Computational Linguistics, pp. 306-311, 1991.

28. [23] K. Megerdoomian, "Developing a Persian part of speech tagger," in Proceedings of the 1st Workshop on Persian Language and Computer, , pp. 99-105, 2004.

29. [24] E. Rahimtoroghi, H. Faili, and A. Shakery, "A structural rule-based stemmer for Persian," in Telecommunications (IST), 2010 5th Inter-national Symposium on, 2010: IEEE, pp. 574-578, 2010.

30. [25] M. Mohseni and B. Minaei-Bidgoli, "A Persian Part-Of-Speech Tagger Based on Morphological Analysis," in LREC, 2010.

31. [26] M. Johnson, "The effect of alternative tree representations on tree bank grammars," in Proceedings of the Joint Conferences on New Methods in Language Processing and Computa-tional Natural Language Learning: Association for Computational Linguis-tics, pp. 39-48, 1998.

32. [27] م. صادقزاده, م. رزازی و ح. محمودی, "تزریق وابستگی ساختاری به دستورهای مستقل از متن آماری زبان فارسی از طریق نشانه‌گذاری قواعد," ارائه شده در بیستمین کنفرانس ملی سالانه انجمن کامپیوتر ایران, مشهد, 1393.

33. [27] M. Sadeghzadeh, M. Razzazi ,and H. Mahmoodi, " Injecting Structural Dependency into Persian PCFG", In Proceedings of 20th Conference of Computer Society of Iran, 2013.

34. [28] M. Collins, "Head-driven statistical models for natural language parsing," Computational linguistics, vol. 29, no. 4, pp. 589-637, 2003.

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.