Skip to main content
Log in

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Author name ambiguity occurs when multiple authors share a common name and an author writes one’s name in many ways. This hinders the quality of information retrieval and correct attribution to authors in bibliographic databases. Despite much research in the past decade, the author name ambiguity problem remains largely unsolved. Outstanding issues include limited capabilities (solve only homonyms or synonyms), require extra information (Web or user feedback), actual number of authors K in advance and not scalable. In this paper, a method called GCLUSIM is proposed which uses graph structural clustering and proposed similarity measure to resolve ambiguous authors. GCLUSIM preprocesses citation data set and constructs co-authors graph. Graph-based structural clustering is applied to the constructed graph to identify hub nodes, outliers, and clusters of nodes. It resolves homonyms by splitting these clusters if the feature vector similarity between these clusters is less than the predefined threshold and synonyms by exploiting proposed similarity. Finally, it disambiguates sole authors by comparing name and feature vector similarities with the disambiguated clusters. Experiments are performed with Arnetminer and BDBComp to validate the performance of the GCLUSIM. Results show that GCLUSIM is scalable, overall better in performance than baselines and the number of clusters found is close to the ground truth clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bhattacharya, I.; Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 5 (2007)

    Article  Google Scholar 

  2. Ferreira, A.A.; Veloso, A.; Gonçalves, M.A.; Laender, A.H.: Effective self-training author name disambiguation in scholarly digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 39–48. ACM (2010)

  3. Tang, J.; Fong, A.C.; Wang, B.; Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2012)

    Article  Google Scholar 

  4. Han, H.; Xu, W.; Zha, H.; Giles, C.L.: A hierarchical naive bayes mixture model for name disambiguation in author citations. In: Proceedings of the 2005 ACM symposium on Applied computing, pp. 1065–1069. ACM (2005)

  5. Shin, D.; Kim, T.; Choi, J.; Kim, J.: Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1), 15–50 (2014)

    Article  Google Scholar 

  6. Han, D.; Liu, S.; Hu, Y.; Wang, B.; Sun, Y.: Elm-based name disambiguation in bibliography. World Wide Web 18(2), 253–263 (2015)

    Article  Google Scholar 

  7. On, B.W.; Lee, D.; Kang, J.; Mitra, P.: Comparative study of name disambiguation problem using a scalable blocking-based framework. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 344–353. ACM (2005)

  8. Huang, J.; Ertekin, S.; Giles, C.L.: Efficient name disambiguation for large-scale databases. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 536–544. Springer (2006)

  9. Treeratpituk, P.; Giles, C.L.: Disambiguating authors in academic publications using random forests. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 39–48. ACM (2009)

  10. Cota, R.G.; Ferreira, A.A.; Nascimento, C.; Gonçalves, M.A.; Laender, A.H.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J. Am. Soc. Inf. Sci. Technol. 61(9), 1853–1870 (2010)

    Article  Google Scholar 

  11. de Carvalho, A.P.; Ferreira, A.A.; Laender, A.H.; Gonçalves, M.A.: Incremental unsupervised name disambiguation in cleaned digital libraries. J. Inf. Data Manag. 2(3), 289 (2011)

    Google Scholar 

  12. Fan, X.; Wang, J.; Pu, X.; Zhou, L.; Lv, B.: On graph-based name disambiguation. J. Data Inf. Qual. (JDIQ) 2(2), 10 (2011)

    Google Scholar 

  13. Onodera, N.; Iwasawa, M.; Midorikawa, N.; Yoshikane, F.; Amano, K.; Ootani, Y.; Kodama, T.; Kiyama, Y.; Tsunoda, H.; Yamazaki, S.: A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search. J. Am. Soc. Inf. Sci. Technol. 62(4), 677–690 (2011)

    Article  Google Scholar 

  14. Huynh, T.; Hoang, K.; Do, T.; Huynh, D.: Vietnamese author name disambiguation for integrating publications from heterogeneous sources. In: Asian Conference on Intelligent Information and Database Systems, pp. 226–235. Springer (2013)

  15. Liu, Y.; Tang, Y.: Network based framework for author name disambiguation applications. Int. J. u and e Serv. Sci. Technol. 8(9), 75–82 (2015)

    Article  Google Scholar 

  16. Wang, X.; Tang, J.; Cheng, H.; Philip, S.Y.: Adana: Active name disambiguation. In: 2011 IEEE 11th International Conference on Data Mining, pp. 794–803. IEEE (2011)

  17. On, B.W.; Elmacioglu, E.; Lee, D.; Kang, J.; Pei, J.: Improving grouped-entity resolution using quasi-cliques. In: Sixth International Conference on Data Mining (ICDM’06), pp. 1008–1015. IEEE (2006)

  18. Peng, H.T.; Lu, C.Y.; Hsu, W.; Ho, J.M.: Disambiguating authors in citations on the web and authorship correlations. Expert Syst. Appl. 39(12), 10521–10532 (2012)

    Article  Google Scholar 

  19. Han, H.; Giles, L.; Zha, H.; Li, C.; Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004, pp. 296–305. IEEE (2004)

  20. Wang, J.; Berzins, K.; Hicks, D.; Melkers, J.; Xiao, F.; Pinheiro, D.: A boosted-trees method for name disambiguation. Scientometrics 93(2), 391–411 (2012)

    Article  Google Scholar 

  21. Xu, X.; Yuruk, N.; Feng, Z.; Schweiger, T.A.: Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833. ACM (2007)

  22. Johnson, D.B.: Finding all the elementary circuits of a directed graph. SIAM J. Comput. 4(1), 77–84 (1975)

    Article  MathSciNet  Google Scholar 

  23. On, B.W.; Lee, I.; Lee, D.: Scalable clustering methods for the name disambiguation problem. Knowl. Inf. Syst. 31(1), 129–151 (2012)

    Article  Google Scholar 

  24. Tran, H.N.; Huynh, T.; Do, T.: Author name disambiguation by using deep neural network. In: Asian Conference on Intelligent Information and Database Systems, pp. 123–132. Springer (2014)

  25. Wu, H.; Li, B.; Pei, Y.; He, J.: Unsupervised author disambiguation using dempster-shafer theory. Scientometrics 101(3), 1955–1972 (2014)

    Article  Google Scholar 

  26. Zhu, J.; Yang, Y.; Xie, Q.; Wang, L.; Hassan, S.U.: Robust hybrid name disambiguation framework for large databases. Scientometrics 98(3), 2255–2274 (2014)

    Article  Google Scholar 

  27. Levin, F.H.; Heuser, C.A.: Evaluating the use of social networks in author name disambiguation in digital libraries. J. Inf. Data Manag. 1(2), 183 (2010)

    Google Scholar 

  28. Shoaib, M.; Daud, A.; Khiyal, M.S.H.: Improving similarity measures for publications with special focus on author name disambiguation. Arab. J. Sci. Eng. 40(6), 1591–1605 (2015)

    Article  MathSciNet  Google Scholar 

  29. Al-Safadi, L.; Al-Rgebh, D.; AlOhali, W.: A comparison between ontology-based and translation-based semantic search engines for arabic blogs. Arab. J. Sci. Eng. 38(11), 2985 (2013)

    Article  Google Scholar 

  30. Al-Rajebah, N.I.; Al-Khalifa, H.S.: Extracting ontologies from arabic wikipedia: a linguistic approach. Arab. J. Sci. Eng 39(4), 2749–2771 (2014)

    Article  Google Scholar 

  31. Mansouri, D.; Mille, A.; Hamdi-Cherif, A.: Adaptive delivery of trainings using ontologies and case-based reasoning. Arab. J. Sci. Eng. 39(3), 1849 (2014)

    Article  Google Scholar 

  32. Huang, Z.; Zhang, J.; Zhang, B.: Information recommendation between user groups in social networks. Arab. J. Sci. Eng. 40(5), 1443–1453 (2015)

    Article  MathSciNet  Google Scholar 

  33. Liu, Q.; Zhou, B.; Li, S.; Li, A.p; Zou, P.; Jia, Y.: Community detection utilizing a novel multi-swarm fruit fly optimization algorithm with hill-climbing strategy. Arab. J. Sci. Eng. 41(3), 807–828 (2016)

    Article  Google Scholar 

  34. Imran, M.; Gillani, S.; Marchese, M.: A real-time heuristic-based unsupervised method for name disambiguation in digital libraries. D Lib. Mag. 19(9), 1 (2013)

    Google Scholar 

  35. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  36. Kang, I.S.; Na, S.H.; Lee, S.; Jung, H.; Kim, P.; Sung, W.K.; Lee, J.H.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)

    Article  Google Scholar 

  37. Cohen, W.; Ravikumar, P.; Fienberg, S.: A comparison of string metrics for matching names and records. In: Kdd Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)

  38. Pereira, D.A.; Ribeiro-Neto, B.; Ziviani, N.; Laender, A.H.; Gonçalves, M.A.; Ferreira, A.A.: Using web information for author name disambiguation. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 49–58. ACM (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ijaz Hussain.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hussain, I., Asghar, S. Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity. Arab J Sci Eng 43, 7421–7437 (2018). https://doi.org/10.1007/s13369-018-3099-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-018-3099-0

Keywords

Navigation