مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

پاک سیما, جواد

doi:10.29252/jsdp.15.4.71

دوره 15، شماره 4 - ( 12-1397 ) جلد 15 شماره 4 صفحات 84-71 | برگشت به فهرست نسخه ها

‎ 10.29252/jsdp.15.4.71

Mendeley

Zotero

RefWorks

paksima J. A novel model for phrase searching based-on Minimum Weighted Relocation Model. JSDP 2019; 15 (4) :71-84
URL: http://jsdp.rcisp.ac.ir/article-1-670-fa.html

پاک سیما جواد. مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار. پردازش علائم و داده‌ها. 1397; 15 (4) :71-84

URL: http://jsdp.rcisp.ac.ir/article-1-670-fa.html

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

جواد پاک سیما^*

دانشگاه پیام‌نور یزد

چکیده: (3346 مشاهده)

بر اساس پژوهشهای انجام‌شده روی موتورهای جستجو،‌ بیشتر پرس‌وجوهای کاربران بیش از یک واژه است. برای پرس‌وجوهای با بیش از یک واژه دو مدل می‌توان ارائه داد. در مدل نخست فرض می‌شود واژگان پرس‌وجو مستقل از یکدیگر هستند و در مدل دوم محل و ترتیب واژگان وابسته فرض می‌شود. آزمایش‌ها نشان می‌دهد که در بیشتر پرس‌وجوها بین واژگان وابستگی وجود دارد. یکی از پارامترهایی که می‌تواند وابستگی بین واژگان پرس‌وجو را مشخص کند، فاصلۀ بین واژگان پرس‌وجو در سند است. در این مقاله تعریف جدیدی از فاصله بر اساس کمینه جابهجایی وزن‌دار[1] واژگان سند بهمنظور تطبیق بر پرس‌وجو ارائه می‌شود. هم‌چنین با توجه به این‌که بیشتر الگوریتم‌های رتبه‌بندی از فرکانس رخداد یک واژه در سند[2] برای امتیاز‌دهی به اسناد استفاده می‌کنند و برای پرس‌وجو با بیش از یک واژه تعریف روشنی از این پارامتر وجود ندارد. در این مقاله پارامترهای ‌فرکانس رخداد یک عبارت[3] و معکوس فرکانس سند[4] با توجه به مفهوم جدید فاصله تعریف‌شده و الگوریتم‌هایی برای محاسبه آن‌ها ارائه شده است. همچنین نتایج الگوریتم پیشنهادی با چند الگوریتم مقایسه شده است که افزایش خوبی را در میانگین دقّت نشان می‌دهد.

[1] MWRM

[2] Term Frequency

[3] Phrase Frequency

[4] Inverted Document Frequency

واژه‌های کلیدی: موتور جستجو، رتبه‌بندی، فاصله، وابستگی واژگان، فرکانس عبارت (PF)

متن کامل [PDF 13146 kb] (786 دریافت)

نوع مطالعه: بنیادی | موضوع مقاله: مقالات پردازش متن
دریافت: 1396/8/25 | پذیرش: 1397/10/19 | انتشار: 1397/12/17 | انتشار الکترونیک: 1397/12/17

فهرست منابع

1. [1] A. Z. Bidoki, "Effective Web Ranking and Crawling(in persian)," University of Tehran, 2009.

2. [2] R. Baeza-Yates and B. Ribeiro-Neto, "Modern information retrieval," New York, vol. 9, p. 513, 1999.

3. [3] G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Informa-tion Processing and Management, vol. 24, no. 5, pp. 513-523, 1988. [DOI:10.1016/0306-4573(88)90021-0]

4. [4] S. E. Robertson, Overview of the Okapi projects, vol. 53, no. 1. MCB UP Ltd, 1997, pp. 3-7. [DOI:10.1108/EUM0000000007186]

5. [5] Y. Zhang and A. Moffat, "Some Observations on User Search Behaviour.," Austr. J. Intelligent Information Processing Systems, vol. 9, no. 2, pp. 1-8, 2006.

6. [6] D. Bahle, H. Williams, and J. Zobel, "Compaction techniques for nextword indexes," in String Processing and Information Retrieval, Interna-tional Symposium on, 2001, p. 33.

7. [7] H. E. Williams, J. Zobel, and D. Bahle, "Fast phrase querying with combined indexes," ACM Transactions on Information Systems (TOIS), vol. 22, no. 4, pp. 573-594, 2004. [DOI:10.1145/1028099.1028102]

8. [8] A. Doucet and H. Ahonen-Myka, "An efficient any language approach for the integration of phrases in document retrieval," Language resources and evaluation, vol. 44, no. 1-2, pp. 159-180, 2010. [DOI:10.1007/s10579-009-9102-3]

9. [9] I. H. Witten, A. Moffat, and T. C. Bell, Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann, 1999.

10. [10] D. Bahle, "Efficient Phrase Querying," School of Computer Science and Information Technology, Royal Melbourne Institute of Technology, 2003.

11. [11] A. Fellinghaug, "Phrase searching in text index-es," no. June, p. 137, 2008.

12. [12] C. J. van Rijsbergen, "A theoretical basis for the use of co-occurrence data in information retrie-val," Journal of documentation, vol. 33, no. 2, pp. 106-119, 1977. [DOI:10.1108/eb026637]

13. [13] R. Nallapati and J. Allan, "Capturing term dependencies using a language model based on sentence trees," in Proceedings of the eleventh international conference on Information and knowledge management, 2002, pp. 383-390. [DOI:10.1145/584853.584855]

14. [14] E. M. Keen, "The use of term position devices in ranked output experiments," Journal of Documentation, vol. 47, no. 1, pp. 1-22, 1991. [DOI:10.1108/eb026869]

15. [15] W. B. Croft, H. R. Turtle, and D. D. Lewis, "The use of phrases and structured queries in information retrieval," in Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, 1991, pp. 32-45. [DOI:10.1145/122860.122864]

16. [16] D. Metzler and W. B. Croft, "A Markov random field model for term dependencies," in Proceed-ings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2005, pp. 472-479. [DOI:10.1145/1076034.1076115]

17. [17] E. K. F. Dang, R. W. P. Luk, and J. Allan, "A context-dependent relevance model," Journal of the Association for Information Science and Technology, 2015. [DOI:10.1002/asi.23419]

18. [18] F. Song and W. B. Croft, "A general language model for information retrieval," in Proceedings of the eighth international conference on In-formation and knowledge management, 1999, pp. 316-321. [DOI:10.1145/319950.320022] [PMCID]

19. [19] J. Gao, J.-Y. Nie, G. Wu, and G. Cao, "Dependence language model for information retrieval," in Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrie-val, 2004, pp. 170-177. [DOI:10.1145/1008992.1009024]

20. [20] B. He, J. X. Huang, and X. Zhou, "Modeling term proximity for probabilistic information retrieval models," Information Sciences, vol. 181, no. 14, pp. 3017-3031, 2011. [DOI:10.1016/j.ins.2011.03.007]

21. [21] Y. Rasolofo and J. Savoy, Term proximity scoring for keyword-based retrieval systems. Springer, 2003. [DOI:10.1007/3-540-36618-0_15]

22. [22] C. Eickhoff, A. P. de Vries, and T. Hofmann, "Modelling Term Dependence with Copulas," in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015, pp. 783-786. [DOI:10.1145/2766462.2767831]

23. [23] S. Büttcher, C. L. A. Clarke, and B. Lushman, "Term proximity scoring for ad-hoc retrieval on very large text collections," in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006, pp. 621-622. [DOI:10.1145/1148170.1148285]

24. [24] T. Tao and C. Zhai, "An exploration of proximity measures in information retrieval," in Proceed-ings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007, pp. 295-302. [DOI:10.1145/1277741.1277794]

25. [25] J. Zhao and Y. Yun, "A proximity language model for information retrieval," in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009, pp. 291-298. [DOI:10.1145/1571941.1571993]

26. [26] J. Zhao, J. X. Huang, and B. He, "CRTER: using cross terms to enhance probabilistic information retrieval," in Proceedings of the 34th inter-national ACM SIGIR conference on Research and development in Information Retrieval, 2011, pp. 155-164. [DOI:10.1145/2009916.2009941] [PMCID]

27. [27] J. Zhao, J. X. Huang, and Z. Ye, "Modeling term associations for probabilistic information retrieval," ACM Transactions on Information Systems (TOIS), vol. 32, no. 2, p. 7, 2014. [DOI:10.1145/2590988]

28. [28] J. Miao, J. X. Huang, and Z. Ye, "Proximity-based rocchio's model for pseudo relevance," in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, 2012, pp. 535-544. [DOI:10.1145/2348283.2348356]

29. [29] C. L. A. Clarke, G. V. Cormack, and E. A. Tudhope, "Relevance ranking for one to three term queries," Information Processing & Management, vol. 36, no. 2, pp. 291-311, 2000. [DOI:10.1016/S0306-4573(99)00017-5]

30. [30] J. Klekota, F. P. Roth, and S. L. Schreiber, "Query Chem: a Google-powered web search combining text and chemical structures," Bioin-formatics, vol. 22, no. 13, pp. 1670-1673, 2006. [DOI:10.1093/bioinformatics/btl155] [PMID]

31. [31] K. Sadakane and H. Imai, "Text Retrieval by using k-word Proximity Search," in Database Applications in Non-Traditional Environments, 1999.(DANTE'99) Proceedings. 1999 Inter-national Symposium on, 1999, pp. 183-188.

32. [32] X. Lu, A. Moffat, and J. S. Culpepper, "On the cost of extracting proximity features for term-dependency models," in CIKM 2015, 2015, pp. 293-302. [DOI:10.1145/2806416.2806467]

33. [33] M. Blum, R. W. Floyd, V. Pratt, R. L. Rivest, and R. E. Tarjan, "Time bounds for selection," Journal of computer and system sciences, vol. 7, no. 4, pp. 448-461, 1973. [DOI:10.1016/S0022-0000(73)80033-9]

34. [34] R. Courant, Differential and integral calculus, vol. 2. John Wiley & Sons, 2011.

35. [35] S. E. Robertson and S. Walker, "Some for Simple Effective Approximations to the 2 - Poisson Model Probabilistic Weighted Retrieval," Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 232-241, 1994. [DOI:10.1007/978-1-4471-2099-5_24]

36. [36] H. Zaragoza, N. Craswell, M. J. Taylor, S. Saria, and S. E. Robertson, "Microsoft Cambridge at TREC 13: Web and Hard Tracks.," in TREC, 2004, vol. 4, p. 1.

37. [37] R. Duda O., P. Hart E., and D. Stork G., Pattern Classification. 2000.

38. [38] S. Robertson and H. Zaragoza, The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009. [DOI:10.1561/1500000019]

39. [39] J. Zhao and J. X. Huang, "An enhanced context-sensitive proximity model for probabilistic information retrieval," in Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 2014, pp. 1131-1134. [DOI:10.1145/2600428.2609527] [PMCID]

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.

نظر شما در مورد قالب جدید چیست؟
	خوب
	متوسط
	ضعیف

پایگاه‌های مرتبط

واژگان کلیدی

نظرسنجی