In recent years, with the growing number of online social networks, these networks have become one of the best markets for advertising and commerce, so studying these networks is very important. Most online social networks are growing and changing with new communications (new edges). Forecasting new edges in online social networks can give us a better understanding of the growth of these networks. Link prediction has many important applications. These include predicting future social networking interactions, the ability to manage and design useful organizational communications, and predicting and preventing relationships in terrorist gangs.
There have been many studies of link prediction in the field of engineering and humanities. Scientists attribute the existence of a new relationship between two individuals for two reasons: 1) Proximity to the graph (structure) 2) Similar properties of the two individuals (Homophile law). Based on the two approaches mentioned, many studies have been carried out and the researchers have presented different similarity metrics for each category. However, studying the impact of the two approaches working together to create new edges remains an open problem.
Similarity metrics can also be divided into two categories; Neighborhood-based and path-based. Neighborhood-based metrics have the advantage that they do not need to access the whole graph to compute, whereas the whole graph must be available at the same time to calculate path-based metrics.
So far, above the two theoretical approaches (proximity and homophile) have not been found together in the neighborhood-based metrics. In this paper, we first attempt to provide a solution to determine importance of the proximity to the graph and similar features in the connectivity of the graphs. Then obtained weights are assigned to both proximity and homophile. Then the best similarity metric in each approach are obtained. Finally, the selected metric of homophily similarity and structural similarity are combined with the obtained weights.
The results of this study were evaluated on two datasets; Zanjan University Graduate School of Social Sciences and Pokec online Social Network. The first data set was collected for this study and then the questionnaires and data collection methods were filled out. Since this dataset is one of the few Iranian datasets that has been compiled with its users' specifications, it can be of great value. In this paper, we have been able to increase the accuracy of Neighborhood-based similarity metric by using two proximity in graph and homophily approaches.
Rights and permissions | |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |