Volume 14, Issue 4 (3-2018)                   JSDP 2018, 14(4): 117-128 | Back to browse issues page

XML Persian Abstract Print

Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Mortazavi S M, Nadimi Shahraki M H, Mosakhani M. Improving the accuracy of the author name disambiguation by using clustering ensemble. JSDP. 2018; 14 (4) :117-128
URL: http://jsdp.rcisp.ac.ir/article-1-524-en.html
Abstract:   (1061 Views)

Today, digital libraries are important academic resources including millions of citations and bibliographic essential information such as titles, author's names and location of publications. From the view of knowledge accumulation management, the ability to search fast, accurate, desired contents, has a great importance. The complexity and similarity in these resources cause many challenges and ambiguities. One of the most of these challenges is the author name disambiguation which makes an extensive scope of research. Although many effective methods have been developed by using clustering techniques in disambiguation of the author's name, the accuracy of these methods is not acceptable and still there are some problems such as fragmentation and error in the produced results of these methods, since there is no uniform standard of citations, various combinations, and numerous, written, verbal patterns. In fact, experiences have shown that the use of a single method to disambiguate names does not provide results with a high accuracy despite concerns expressed above. In this paper, a new method is proposed to disambiguate author names in different formats and combinations with more accuracy. The proposed solution carries out the disambiguation in two steps; In the first step, agglomerative hierarchical clustering algorithm produces clusters using similar functions and different thresholds. In the second step, clusters produced by clustering ensemble technique in the previous stage are combined to provide more accurate clusters with less fragmentation. The proposed method is experimentally evaluated by conducted DBLP datasets with K criterion. The evaluation results show that the proposed method enhances the accuracy of disambiguation of author names in different formats.

Full-Text [PDF 4244 kb]   (329 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2016/05/23 | Accepted: 2017/12/31 | Published: 2018/03/13 | ePublished: 2018/03/13

1. [1] R. G. Cota, A. A. Ferreira, C. Nascimento, M. A. Gonçalves, and A. H. Laender, "An unsupervised heuristic‐based hierarchical method for name disambiguation in bibliographic citations," Journal of the American Society for Information Science and Technology, vol. 61, pp. 1853-1870, 2010. [DOI:10.1002/asi.21363]
2. [2] B.-W. On and D. Lee, "Scalable Name Disambiguation using Multi-level Graph Part-ition," in SDM, 2007.
3. [3] X. Fan, J. Wang, X. Pu, L. Zhou, and B. Lv, "On graph-based name disambiguation," Journal of Data and Information Quality (JDIQ), vol. 2, p. 10, 2011. [DOI:10.1145/1891879.1891883]
4. [4] Z. Chen, D. V. Kalashnikov, and S. Mehrotra, "Adaptive graphical approach to entity resolu-tion," in Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, 2007, pp. 204-213. [PMCID]
5. [5] L. D. u Thu, "Named Entity Disambiguation in Digital Libraries," 2010.
6. [6] B.-W. On, E. Elmacioglu, D. Lee, J. Kang, and J. Pei, "Improving grouped-entity resolution using quasi-cliques," in Data Mining, 2006. ICDM'06. Sixth International Conference on, 2006, pp. 1008-1015. [DOI:10.1109/ICDM.2006.85]
7. [7] I.-S. Kang, S.-H. Na, S. Lee, H. Jung, P. Kim, W.-K. Sung, et al., "On co-authorship for author disambiguation," Information Processing & Management, vol. 45, pp. 84-97, 2009. [DOI:10.1016/j.ipm.2008.06.006]
8. [8] H. Han, L. Giles, H. Zha, C. Li, and K. Tsioutsiouliklis, "Two supervised learning approaches for name disambiguation in author citations," in Digital Libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on, 2004, pp. 296-305. [DOI:10.1145/996350.996419]
9. [9] H. Han, H. Zha, and C. L. Giles, "Name disambiguation in author citations using a k-way spectral clustering method," in Digital Libraries, 2005. JCDL'05. Proceedings of the 5th ACM/IEEE-CS Joint Conference on, 2005, pp. 334-343. [DOI:10.1145/1065385.1065462]
10. [10] J. Huang, S. Ertekin, and C. L. Giles, "Efficient name disambiguation for large-scale databases," in Knowledge Discovery in Databases: PKDD 2006, ed: Springer, 2006, pp. 536-544. [DOI:10.1007/11871637_53]
11. [11] B. Zhang and M. A. Hasan, "Name Entity Disambiguation in Anonymized Graphs using Link Analysis: A Network Embedding based Solution," arXiv preprint arXiv:1702.02287, 2017.
12. [12] S. Ressler, "Social network analysis as an approach to combat terrorism: Past, present, and future research," Homeland Security Affairs, vol. 2, pp. 1-10, 2006.
13. [13] F. H. Levin and C. A. Heuser, "Using Genetic Programming to Evaluate the Impact of Social Network Analysis in Author Name Disambigu-ation," in AMW, 2010.
14. [14] D. Shin, T. Kim, H. Jung, and J. Choi, "Automatic method for author name disambigu-ation using social networks," in Advanced Information Networking and Applica-tions (AINA), 2010 24th IEEE Intern-ational Confe-rence on, 2010, pp. 1263-1270.
15. [15] Y. Ju, B. Adams, K. Janowicz, Y. Hu, B. Yan, and G. McKenzie, "Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occur-rence with Topic Modeling," in Knowledge Engineering and Knowledge Management: 20th International Conference, EKAW 2016, Bolo-gna, Italy, November 19-23, 2016, Procee-dings 20, 2016, pp. 353-367. [DOI:10.1007/978-3-319-49004-5_23]
16. [16] I. B. L. Getoor, "A Latent Dirichlet Model for Unsupervised Entity Resolution," in Procee-dings of the Sixth SIAM International Confe-rence on Data Mining, 2006, p. 47.
17. [17] I. Bhattacharya and L. Getoor, "Collective entity resolution in relational data," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 1, p. 5, 2007. [DOI:10.1145/1217299.1217304]
18. [18] Y. Song, J. Huang, I. G. Councill, J. Li, and C. L. Giles, "Generative models for name disambiguation," in Proceedings of the 16th international conference on World Wide Web, 2007, pp. 1163-1164. [DOI:10.1145/1242572.1242746] [PMID]
19. [19] D. A. Pereira, B. Ribeiro-Neto, N. Ziviani, A. H. Laender, M. A. Gonçalves, and A. A. Ferreira, "Using web information for author name disambiguation," in Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, 2009, pp. 49-58. [DOI:10.1145/1555400.1555409]
20. [20] K.-H. Yang, H.-T. Peng, J.-Y. Jiang, H.-M. Lee, and J.-M. Ho, "Author name disambiguation for citations using topic and web correlation," in Research and Advanced Technology for Digital Libraries, ed: Springer, 2008, pp. 185-196. [DOI:10.1007/978-3-540-87599-4_19]
21. [21] V. I. Torvik and N. R. Smalheiser, "Author name disambiguation in MEDLINE," ACM Transa-ctions on Knowledge Discovery from Data (TKDD), vol. 3, p. 11, 2009. [DOI:10.1145/1552303.1552304] [PMID] [PMCID]
22. [22] A. A. Ferreira, A. Veloso, M. A. Gonçalves, and A. H. Laender, "Effective self-training author name disambiguation in scholarly digital libra-ries," in Proceedings of the 10th annual joint conference on Digital libraries, 2010, pp. 39-48. [DOI:10.1145/1816123.1816130]
23. [23] W. W. Cohen, H. Kautz, and D. McAllester, "Hardening soft information sources," in Proceedings of the sixth ACM SIGKDD interna-tional conference on Knowledge discovery and data mining, 2000, pp. 255-259. [DOI:10.1145/347090.347141] [PMID]
24. [24] F. H. Levin and C. A. Heuser, "Evaluating the use of social networks in author name disambiguation in digital libraries," Journal of Information and Data Management, vol. 1, p. 183, 2010.
25. [25] M. H. Nadimi and M. Mosakhani, "A more Accurate Clustering Method by using Co-author Social Networks for Author Name Disambigu-ation," Journal of Computing and Security, vol. 1, 2015.

Add your comments about this article : Your username or Email:

Send email to the article author

© 2015 All Rights Reserved | Signal and Data Processing