Volume 18, Issue 4 (3-2022)                   JSDP 2022, 18(4): 69-80 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

abdi Z, mazoochi M, pourmina M. Representing a method to identify and contrast with the fraud which is created by robots for developing websites’ traffic ranking. JSDP 2022; 18 (4) : 5
URL: http://jsdp.rcisp.ac.ir/article-1-1032-en.html
ICT Research Institute
Abstract:   (1319 Views)
With the expansion of the Internet and the Web, communication and information gathering between individual has distracted from its traditional form and into web sites. The World Wide Web also offers a great opportunity for businesses to improve their relationship with the client and expand their marketplace in online world. Businesses use a criterion called traffic ranking to determine their site's popularity and visibility. Traffic ranking measures the amount of visitors to a site and based on these statistics, allocates a ranking to the site. One of the most important challenges in the ranking is the creation of fake traffic that generated by applications called robots. Robots are malicious software components that used to generate spam, set up distributed denial of services attacks, fishing, identity theft, removal of information and other illegal activities .there are already several ways to identify and discover the robot. According to Doran et al., The identification methods are divided into two categories: offline and real-time. The offline detection method is divided into three categories: Syntactical Log Analysis, Traffic Pattern Analysis, and Analytical Learning Techniques. The real-time method is performed by the Turing test system. In this research, the identification of robots is done through the offline method by analysis and processing of access logs to the web server and the use of data mining techniques. In this method, first, the features of each session are extracted, then generally these sessions are labeled with three conditions into two categories of human and robot. Finally, by using data mining tool, web robots are detected. In all previous studies, the features are extracted from each sessions, for example in first studies, Tan&Kumar extracted 25 features of sessions. After that Bomhardt et al. used 34 features to identify the robots. In 2009 Stassopoulou et al. used 6 features that was extracted from sessions and so on. But in this research, features are extracted from sessions of a unique user. Experimental results show that the proposed method in this research, by discovering new features and introducing a new condition in session labeling, improves the accuracy of identifying robots and moreover, improves the ranking of web traffic from previous work.
Article number: 5
Full-Text [PDF 1216 kb]   (370 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2019/06/26 | Accepted: 2020/01/11 | Published: 2022/03/21 | ePublished: 2022/03/21

References
1. [1] رجب‌نیا جواد، ذبیحی مهدیه، وفایی‌جهان مجید، "تشخیص روبات‌های وب با استفاده از سیستم استنتاج فازی مبتنی بر درخت تصمیم"، هفتمین کنفرانس داده‌کاوی ایران، 1392.
2. [1] J. Rajab Nia, M. Zabihi, M. VafahiJahan, "web robot detection with fuzzy inference system based on decision trees," The Seventh Iran Data Mining Conference, 2013.
3. [2] B. W.N.Lo, R.. SharmaSedhain, "How Reliable Are Website Rankings? Implications For E-Business Advertising And Internet Search," Issues in Information Systems, Volume VII, No. 2, pp. 233-238, 2006.
4. [3] What is fake traffic?, [Online], https://sedo-us1.custhelp.com/app/answers/detail/a_id/678/~/what-is-fake-traffic, February 2017.
5. [4] D.S. Sisodia, Sh. Verma, O.P. Vyas, "A Comparative Analysis of Browsing Behavior of Human Visitors and Automatic Software Agents," American Journal of Systems and Software, vol. 3, no. 2, pp. 31-35, 2015.
6. [5] A. Stassopoulou, M.D. Dikaiakos," Web robot detection: A probabilistic reasoning approach," Computer Networks, Vol. 53, pp. 265-278, 2009. [DOI:10.1016/j.comnet.2008.09.021]
7. [6] D. Doran, S.S. Gokhale, "Web Robot Detection Techniques: Overview And Limitations," springer Data Mining and Knowledge Discovery, Vol. 22, pp. 183-210, 2010. [DOI:10.1007/s10618-010-0180-z]
8. [7] P.N. TAN, V. KUMAR,"Discovery of Web Robot Sessions Based on their Navigational Patterns," Data Mining and Knowledge Discovery, vol. 6, pp. 9-35, 2002.
9. [8] CH. Bomhardt, W. Gaul, L. Schmidt-Thieme, "Web Robot Detection - Preprocessing Web Logfiles for Robot Detection," In Proceedings of SISCLADAG.Bologna, Ital, pp. 113-124, 2005. [DOI:10.1007/3-540-27373-5_14]
10. [9] D. Stevanovic, A. An, N. Vlajic, "Feature evaluation for web crawler detection with data mining techniques," Elsevier, Expert Systems with Applications, Vol. 39, pp. 8707-8717, 2012. [DOI:10.1016/j.eswa.2012.01.210]
11. [10] D. Stevanovic, N. Vlajic, A. An, "Detection of malicious and non-malicious website visitors using unsupervised neural network learning," Elsevier, Applied Soft Computing 13, pp. 698-708, 2012. [DOI:10.1016/j.asoc.2012.08.028]
12. [11] M. Zabihimayvan, M. VafaeiJahan, J. Hamidzadeh,"A Density Based Clustering Approach for Web Robot Detection," IEEE, 4th International Conference On Computer And Knowledge Engineering (ICCKE), pp. 23-28, 2014. [DOI:10.1109/ICCKE.2014.6993362]
13. [12] D.S. Sisodia, Sh. Verma, .O.P. Vyas, "Agglomerative Approach for Identification and Elimination of Web Robots from Web Server Logs to Extract Knowledge about Actual Visitors," Journal of Data Analysis and Information Processing, Vol. 3, pp. 1-10, 2015. [DOI:10.4236/jdaip.2015.31001]
14. [13] J. Hamidzadeh, M. Zabihimayvan, R. Sadeghi, "Detection of Web site visitors based on fuzzy rough sets," Springer, pp. 2175-2188, 2017. [DOI:10.1007/s00500-016-2476-4]
15. [14] user-agent-string. [online], http://user-agent-string.info/list-of-ua/bots-ip , ,(December 2017) Bot vs.Browsers. [Online], http://www.botsvs-browsers.com, December 2017.
16. [15] User-Agents. [Online], http://www.user-agents.org, December 2017.
17. [16] S.S. Aksenova, "Machine Learning with WEKA :WEKA Explorer Tutorial for WEKA Version 3.4.3," 2004 .
18. [17]http://www.secrepo.com/maccdc2012/http.log.gz
19. [18] http://www.cs.waikato.ac.nz/ml/weka/

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing