روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی
 نتایج موتور جستجو

داودی مقدم, جواد; احمدی, علی

doi:10.29252/jsdp.14.3.65

***************«بسم الله الرحمن الرحیم» نشریه علمی «پردازش علائم و داده‌ها» با مجوز رسمی از کمیسیون نشریات وزارت علوم، تحقیقات و فناوری، صاحب امتیاز: پژوهشگاه توسعه فناوری‌های پیشرفته ***************

Signal and Data Processing Journal A scientific journal officially licensed by the Commission for Scientific Publications of the (MSRT). Publisher: Research Ceter for Developmen of Technologies

EN FA

دوره 14، شماره 3 - ( 9-1396 ) جلد 14 شماره 3 صفحات 82-65 | برگشت به فهرست نسخه ها

‎ 10.29252/jsdp.14.3.65

Mendeley

Zotero

RefWorks

Ahmadi A. A Novel Text Mining Method for User Context Extraction to Improve Search Engine Results Ranking. JSDP 2017; 14 (3) :65-82
URL: http://jsdp.rcisp.ac.ir/article-1-473-fa.html

داودی مقدم جواد، احمدی علی. روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو . پردازش علائم و داده‌ها. 1396; 14 (3) :65-82

URL: http://jsdp.rcisp.ac.ir/article-1-473-fa.html

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

جواد داودی مقدم^*، علی احمدی

دانشگاه صنعتی خواجه نصیرالدین طوسی

چکیده: (7379 مشاهده)

یکی از بزرگ‌ترین مشکلات پیشروی موتورهای جستجو، رفع ابهاماتی است که در جستار کاربران وجود دارد. این ابهامات میتواند دلایل متعددی داشته باشد که از جمله آنها تعدد معانی و مفاهیم مرتبط با یک جستار یا کاربردهای مختلف آن جستار است. اگر موتور جستجو نتواند این ابهام را به شکل صحیح برطرف کند، در ارائه نتایج خود به کاربر دچار اختلال و خطا خواهد شد و نیاز کاربر را برطرف نخواهد کرد. این موضوع نقش مهمی در تعیین میزان کارایی موتور جستجو خواهد داشت. در این مقاله هدف آن است تا با جمع‌آوری اطلاعات زمینه کاربر در طول زمان، به تفسیر جستار کاربر کمک کرده و درنتیجه آن رتبه‌بندی نتایج موتور جستجو را بهبود بخشیم. زمینه کاربر به هر اطلاعاتی گفته میشود که به شناخت ویژگیها و خصوصیات کاربر کمک کند. در این مقاله متن صفحات وبی که کاربر از آن‌ها بازدید میکند، مورد پردازش قرار میگیرند تا مفاهیم اصلی و کلیدی آن‌ها استخراج شود. استخراج این مفاهیم (زمینه کاربر) که در سمت کاربر و بر روی سیستم وی اتفاق خواهد افتاد، با افزونهای خواهد بود که به همین منظور تولید و بر روی مرورگر نصب میشود؛ سپس زمینه کاربر، در ساختاری خاص در سمت کاربر و برای هر کاربر به‌صورت خصوصی نگهداری میشوند. هنگامی که جستجویی انجام میشود (با توجه به خلاصهای که موتور جستجو در ازای معرفی هر پیوند ارائه میدهد)، میزان شباهت نتایج موتور جستجو با زمینه کاربر مورد محاسبه قرار گرفته و به‌ازای هر نتیجه میزان شباهت آن با زمینه کاربر محاسبه می‌شود؛ سپس آن نتایجی به کاربر پیشنهاد می‌شوند (در مرورگر پررنگ‌ میشوند) که با زمینه وی تطبیق بیشتری داشته باشند. همان‌طور‌که از نتایج آزمایش‌های پایان مقاله مشهود است، استفاده از زمینه کاربر در رتبه‌بندی نتایج موتور جستجو تاثیر قابل توجهی دارد. بررسیها نشان میدهد که در ارائه 10 نتیجه اول مربوط به 30 جستار دارای ابهام، به طور میانگین روش پیشنهادی 43% و موتور جستجوی گوگل 16% از نتایج خود را مرتبط با مفهوم اصلی جستار مورد نظر ارائه کردهاند.

واژه‌های کلیدی: متن‌کاوی، بازیابی اطلاعات، زمینه کاربر، رتبه بندی نتایج موتور جستجو

متن کامل [PDF 5540 kb] (4115 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: مقالات پردازش متن
دریافت: 1394/10/4 | پذیرش: 1395/5/3 | انتشار: 1396/11/9 | انتشار الکترونیک: 1396/11/9

فهرست منابع

1. [1] Hamdi, Mohamed Salah, "SOMSE: A semantic map based meta-search engine for the purpose of web information customization", Applied Soft Computing, vol. 11, no. 1, pp. 1310-1321, 2011. [DOI:10.1016/j.asoc.2010.04.004]

2. [2] Mangold, Christoph, "A survey and classification of semantic search approaches", International Journal of Metadata, Semantics and Ontologies, vol. 2, no. 1, pp. 23-34, 2007. [DOI:10.1504/IJMSO.2007.015073]

3. [3] Kirar, Dilip, and Pranita Jain, "Equirs: Explicitly query understanding information retrieval system based on hmm", International Journal of Engineering Inventions, vol. 2, no 1, pp. 31-36, Jan. 2013.

4. [4] Vaughan, Liwen, and Mike Thelwall, "Search engine coverage bias: evidence and possible causes", Information processing & management, vol. 40, no. 4, pp. 693-707, 2004. [DOI:10.1016/S0306-4573(03)00063-3]

5. [5] Jansen, Bernard J., et al, "Real life information retrieval: A study of user queries on the web", In ACM SIGIR Forum, vol. 32, no. 1, pp. 5-17, 1998. [DOI:10.1145/281250.281253]

6. [6] Jansen, Bernard J., and Danielle Booth, "Classifying web queries by topic and user intent", CHI'10 Extended Abstracts on Human Factors in Computing Systems, pp. 4285-4290, ACM, 2010.

7. [7] Calderón-Benavides, Liliana, Cristina González-Caro, and Ricardo Baeza-Yates, "Towards a deeper understanding of the user's query intent", SIGIR 2010 Workshop on Query Representation and Understanding, pp. 21-24, 2010.

8. [8] Abowd, Gregory D., et al, "Towards a Better Understanding of Context and Context-Awareness", Handheld and ubiquitous computing. Springer Berlin Heidelberg, pp. 304-307, 1999. [DOI:10.1007/3-540-48157-5_29]

9. [9] Allan, James, and Hema Raghavan, "Using part-of-speech patterns to reduce query ambiguity", In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 307-314, 2002. [DOI:10.1145/564376.564430]

10. [10] Bäurle, Florian, "A user interface for semantic full text search", Master Thesis, Faculty of Engineerin, University of Freiburg, 2011.

11. [11] Bing, Lidong, and Wai Lam, "Investigation of web query refinement via Topic Analysis and Learning with Personalization", 2011.

12. [12] Fonseca, Bruno M., et al, "Concept-based interactive query expansion", In Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 696-703, 2005. [DOI:10.1145/1099554.1099726]

13. [13] Song, Wei, et al, "An effective query recommendation approach using semantic strateg-ies for intelligent information retrieval", Expert Systems with Applications, vol. 41, no. 2, pp. 366-372, 2014. [DOI:10.1016/j.eswa.2013.07.052]

14. [14] Bordogna, Gloria, et al, "Disambiguated query suggestions and personalized content-similarity and novelty ranking of clustered results to optimize web searches", Information Processing & Management, vol. 48, no. 3, pp. 419-437, 2012. [DOI:10.1016/j.ipm.2011.03.008]

15. [15] Broccolo, Daniele, et al, "Generating suggestions for queries in the long tail with an inverted index", Information Processing & Management, vol. 48, no. 2, pp. 326-339, 2012. [DOI:10.1016/j.ipm.2011.07.005]

16. [16] González-Caro, Cristina, and Ricardo Baeza-Yates, "A multi-faceted approach to query intent classification", String Processing and Informa-tion Retrieval, Springer Berlin Heidelb-erg, pp. 368-379, 2011. [DOI:10.1007/978-3-642-24583-1_36]

17. [17] Jiang, Daxin, Jian Pei, and Hang Li, "Mining Search and Browse Logs for Web Search: A Survey", ACM Transactions on Computational Logic, pp. 1-42, Apr. 2013. [DOI:10.1145/2508037.2508038]

18. [18] Li, Lin, et al, "A feature-free search query classification approach using semantic dist-ance", Expert Systems with Applications, vol. 39, no. 12, pp. 10739-10748, 2012. [DOI:10.1016/j.eswa.2012.02.191]

19. [19] Bai, Lu, et al, "Exploring the query-flow graph with a mixture model for query recommenda-tion", Proceedings of IGIR Work-shop on Query Representation and Understand-ing, Beijing, China, Jul. 2011.

20. [20] Beeferman, Doug, and Adam Berger, "Agglomerative clustering of a search engine query log." In Proceedings of the sixth ACM SIGKDD international conference on Knowl-edge discov-ery and data mining, pp. 407-416, 2000. [DOI:10.1145/347090.347176]

21. [21] Andersen, Casper, and Daniel Christensen, "User Logs for Query Disambiguation", 2013.

22. [22] Sondhi, Parikshit, Raman Chandrasekar, and Robert Rounthwaite. "Using query context mod-els to construct topical search engin-es", In Proceed-ings of the third symposium on Information interaction in context, pp. 75-84, 2010. [DOI:10.1145/1840784.1840797]

23. [23] Wu, Wei, Bin Zhang, and Mari Ostendorf. "Automatic generation of personalized annota-tion tags for twitter users", In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 689-692, 2010.

24. [24] Biancalana, Claudio, and Alessandro Micarelli. "Social tagging in query expansion: A new way for personalized web search", In Computational Science and Engineering, vol. 4, pp. 1060-1065, 2009. [DOI:10.1109/CSE.2009.492]

25. [25] Kramár, Tomáš, Michal Barla, and Mária Bieliková. "Disambiguating search by leverage-ing a social context based on the stream of user's activity", In User Modeling, Adaptation, and Personalization, pp. 387-392, 2010.

26. [26] Cao, Huanhuan, et al, "Context-aware query classification", In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp. 3-10, 2009. [DOI:10.1145/1571941.1571945]

27. [27] Joachims, Thorsten. "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization", Carnegie-mellon univ Pittsbur-gh pa dept of computer science, No. CMU-CS-96-118, 1996.

28. [28] Luo, Le, and Li Li. "Defining and evaluating classification algorithm for high-dimensional data based on latent topics", PloS one 9, No. 1, 2014.

29. [29] Zeng, Hua-Jun, et al, "Learning to cluster web search results", In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 210-217, 2004. [DOI:10.1145/1008992.1009030]

30. [30] Song, Ruihua, et al, "Identification of ambigu-ous queries in web search", Information Processing & Management, vol. 45, no. 2, pp. 216-229, 2009. [DOI:10.1016/j.ipm.2008.09.005]

31. [31] Li, Ying, Zijian Zheng, and Honghua Kathy Dai, "KDD CUP-2005 report: Facing a great chall-enge", ACM SIGKDD Explorations Newsletter 7, no. 2, pp. 91-99, 2005. [DOI:10.1145/1117454.1117466]

32. ]32[ فرهاد راد، حمید پروین، آتوسا دهباشی و بهروز مینایی. "ارائه روشی جدید برای شاخص گذاری خودکار و استخراج کلمات کلیدی برای بازیابی اطلاعات و خوشه‎بندی متون". فصلنامه علمی-پژوهشی پردازش علائم و داده¬ها، جلد 13، شماره 1، صفحه 78-100، 1395.

33. [32] Farhad Rad, Hamid Parvin, Atoosa dahbashi and Behrooz Minaei. "Improved Clustering Persian Text Based on Keyword Using Linguistic and Thesaurus Knowledge", Signal and Data Processing, Vol. 13, No. 1, P.P. 78-100, 2016.

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.