Volume 17, Issue 3 (11-2020)                   JSDP 2020, 17(3): 55-70 | Back to browse issues page


XML Persian Abstract Print


Azarbaijan Shahid Madani University
Abstract:   (2839 Views)
In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as degree distribution, clustering coefficient, internal density and community structures, etc. There are various sampling methods such as random walk-based methods, methods based on the shortest path, graph partitioning-based algorithms, and etc. Each group of methods has its own pros and cones. The main drawback of these methods is the lack of attention to the high time complexity in making the sample graph and the quality of the obtained sample graph. In this paper, we propose a new sampling method by proposing a new equation based on the structural properties of social networks and combining it with bee colony algorithm. This sampling method uses an informed and non-random approach so that the generated samples are similar to the original network in terms of features such as network topological properties, degree distribution, internal density, and preserving the clustering coefficient and community structures. Due to the random nature of initial population generation in meta-heuristic sampling methods such as genetic algorithms and other evolutionary algorithms, in our proposed method, the idea of ​​consciously selecting nodes in producing the initial solutions is presented. In this method, based on the finding hub and semi-hub nodes as well as other important nodes such as core nodes, it is tried to maintain the presence of these important nodes in producing the initial solutions and the obtained samples as much as possible. This leads to obtain a high-quality final sample which is close to the quality of the main network. In this method, the obtained sample graph is well compatible with the main network and can preserve the main characteristics of the original network such as topology, the number of communities, and the large component of the original graph as much as possible in sample network. Non-random and conscious selection of nodes and their involvement in the initial steps of sample extraction have two important advantages in the proposed method. The first advantage is the stability of the new method in extracting high quality samples in each time. In other words, despite the random behavior of the bee algorithm, the obtained samples in the final phase mostly have close quality to each other. Another advantage of the proposed method is the satisfactory running time of the proposed algorithm in finding a new sample. In fact, perhaps the first question for asking is about time complexity and relatively slow convergence of the bee colony algorithm. In response, due to the conscious selection of important nodes and using them in the initial solutions, it generates high quality solutions for the bee colony algorithm in terms of fitness function calculation. The experimental results on real world networks show that the proposed method is the best to preserve the degree distribution parameters, clustering coefficient, and community structure in comparison to other method.
Full-Text [PDF 4319 kb]   (622 Downloads)    
Type of Study: بنیادی | Subject: Paper
Received: 2019/05/2 | Accepted: 2020/01/22 | Published: 2020/12/5 | ePublished: 2020/12/5

References
1. [1] E. Katz, P. F. Lazarsfeld, and E. Roper, Personal influence: The part played by people in the flow of mass communications. Routledge, 2017. [DOI:10.4324/9781315126234]
2. [2] N. B. Ellison, J. Vitak, R. Gray, and C. Lampe, "Cultivating social resources on social network sites: Facebook relationship maintenance behaviors and their role in social capital processes," Journal of Computer-Mediated Communication, vol. 19, no. 4, pp. 855-870, 2014. [DOI:10.1111/jcc4.12078]
3. [3] M. Irani and M. Haghighi, "The Impact of Social Networks on the Internet Business Sustainability (With Emphasis on the Intermediary Role of Entrepreneurial Purpose of Online Branches of Mellat Bank's Portal)," Journal of Information Technology Management, vol. 5, no. 4, pp. 23-46, 2013.
4. [4] M. Emirbayer and J. Goodwin, "Network analysis, culture, and the problem of agency," American journal of sociology, vol. 99, no. 6, pp. 1411-1454, 1994. [DOI:10.1086/230450]
5. [5] J. Tang, J. Sun, C. Wang, and Z. Yang, "Social influence analysis in large-scale networks," in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009, pp. 807-816: ACM . [DOI:10.1145/1557019.1557108]
6. [6] J. Török, Y. Murase, H.-H. Jo, J. Kertész, and K. Kaski, "What big data tells: sampling the social network by communication channels," Physical Review E, vol. 94, no. 5, pp. 052319, 2016. [DOI:10.1103/PhysRevE.94.052319] [PMID]
7. [7] M. Papagelis, G. Das, and N. Koudas, "Sampling online social networks," IEEE Transactions on knowledge and data engineering, vol. 25, no. 3, pp. 662-676, 2013. [DOI:10.1109/TKDE.2011.254]
8. [8] K. Dempsey, K. Duraisamy, H. Ali, and S. Bhowmick, "A parallel graph sampling algorithm for analyzing gene correlation networks," Procedia Computer Science, vol. 4, pp. 136-145, 2011. [DOI:10.1016/j.procs.2011.04.015]
9. [9] J. Leskovec and C. Faloutsos, "Sampling from large graphs," in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 631-636: ACM. [DOI:10.1145/1150402.1150479]
10. [10] S. Yoon, S. Lee, S.-H. Yook, and Y. Kim, "Statistical properties of sampled networks by random walks," Physical Review E, vol. 75, no. 4, pp. 046114, 2007. [DOI:10.1103/PhysRevE.75.046114] [PMID]
11. [11] S.-H. Yoon, K.-N. Kim, J. Hong, S.-W. Kim, and S. Park, "A community-based sampling method using DPL for online social networks," Information Sciences, vol. 306, pp. 53-69, 2015. [DOI:10.1016/j.ins.2015.02.014]
12. [12] E. M. Airoldi, X. Bai, and K. M. Carley, "Network sampling and classification: An investigation of network model representations," Decision support systems, vol. 51, no. 3, pp. 506-518, 2011. [DOI:10.1016/j.dss.2011.02.014] [PMID] [PMCID]
13. [13] P. Krömer and J. Platoš, "Genetic algorithm for sampling from scale-free data and networks," in Proceedings of the 2014 annual conference on genetic and evolutionary computation, 2014, pp. 793-800: ACM. [DOI:10.1145/2576768.2598391]
14. [14] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, "Equation of state calculations by fast computing machines," The journal of chemical physics, vol. 21, no. 6, pp. 1087-1092, 1953. [DOI:10.1063/1.1699114]
15. [15] L. A. Goodman, "Snowball sampling," The annals of mathematical statistics, pp. 148-170, 1961. [DOI:10.1214/aoms/1177705148]
16. [16] L. Lovász, "Random walks on graphs: A survey," Combinatorics, Paul erdos is eighty, vol. 2, no. 1, pp. 1-46, 1993.
17. [17] D. D. Heckathorn, "Respondent-driven sampling: a new approach to the study of hidden populations," Social problems, vol. 44, no. 2, pp. 174-199, 1997. [DOI:10.2307/3096941]
18. [18] S. Ye, J. Lang, and F. Wu, "Crawling online social graphs," in 2010 12th International Asia-Pacific Web Conference:IEEEP, pp. 236-242, 2010. [DOI:10.1109/APWeb.2010.10]
19. [19] A. Rezvanian and M. R. Meybodi, "Sampling social networks using shortest paths," Physica A: Statistical Mechanics and its Applications, vol. 424, pp. 254-268, 2015. [DOI:10.1016/j.physa.2015.01.030]
20. [20] A. Sevilla, A. Mozo, and A. F. Anta, "Node sampling using random centrifugal walks," Journal of Computational Science, vol. 11, pp. 34-45, 2015. [DOI:10.1016/j.jocs.2015.09.001]
21. [21] C. Tong, Y. Lian, J. Niu, Z. Xie, and Y. Zhang, "A novel green algorithm for sampling complex networks," Journal of Network and Computer Applications, vol. 59, pp. 55-62, 2016. [DOI:10.1016/j.jnca.2015.05.021]
22. [22] N. Ahmed, J. Neville, and R. R. Kompella, "Network sampling via edge-based node selection with graph induction," 2011.
23. [23] K. Brádler, P.-L. Dallaire-Demers, P. Rebentrost, D. Su, and C. Weedbrook, "Gaussian boson sampling for perfect matchings of arbitrary graphs," Physical Review A, vol. 98, no. 3, pp. 032310, 09/10/ 2018. [DOI:10.1103/PhysRevA.98.032310]
24. [24] J. Zhao, P. Wang, J. C. S. Lui, D. Towsley, and X. Guan, "Sampling online social networks by random walk with indirect jumps," Data Mining and Knowledge Discovery, journal article vol. 33, no. 1, pp. 24-57, January 01. 2019. [DOI:10.1007/s10618-018-0587-5]
25. [25] Y. Xie, S. Chang, Z. Zhang, M. Zhang, and L. Yang, "Efficient sampling of complex network with modified random walk strategies," Physica A: Statistical Mechanics and its Applications, vol. 492, pp. 57-64, 2018. [DOI:10.1016/j.physa.2017.09.032]
26. [26] K. Berahmand and A. Bouyer, "LP-LPA: A link influence-based label propagation algorithm for discovering community structures in networks," International Journal of Modern Physics B, vol. 32, no. 06, pp. 1850062, 2018. [DOI:10.1142/S0217979218500625]

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.