Effect of class imbalance in heterogeneous network embedding: An empirical study |
| |
Affiliation: | 1. College of Liberal Arts and Sciences, National University of Defense Technology, Changsha, China;2. Department of Mathematics, University of California, Los Angeles, USA;1. Department of Information Management, Nanjing University of Science and Technology, Nanjing, China;2. Department of Network and New Media, Nanjing Normal University, Nanjing, China;1. Center for Modern Korean Studies, Yonsei University, Wonju, Republic of Korea;2. Department of Library and Information Science, Yonsei University, Seoul, Republic of Korea;3. College of Computing and Informatics, Drexel University, Philadelphia, USA;1. Laboratory for Studies in Research Evaluation, Institute for System Analysis and Computer Science (IASI-CNR), National Research Council, Rome, Italy;2. Nordic Institute for Studies in Innovation, Research and Education, Oslo, Norway;3. University of Rome “Tor Vergata”, Dept of Engineering and Management, Rome, Italy;1. University of Hasselt, Belgium;2. University of Antwerp, Faculty of Social Sciences, B-2020, Antwerpen, Belgium;3. Centre for R&D Monitoring (ECOOM) and Dept. MSI, KU Leuven, Leuven, Belgium |
| |
Abstract: | Network science has been extensively explored in solving various bibliometrics tasks such as Co-authorship prediction, Author classification, Author clustering, Author ranking, Paper ranking, etc. While majority of the past studies exploit homogeneous bibliographic network (consists of singular type of nodes and edges), in recent past there is a surge in using heterogeneous bibliographic entities and their inter-dependencies using heterogeneous information networks (HIN). Unlike homogeneous bibliographic networks, a bibliographic HIN consists of multi-typed nodes such as Author, Paper, Venue, etc. and corresponding relations. Thus bibliographic HIN is more complex and captures rich semantics of underlying bibliographic data as well as poses more challenges. Since a real-world HIN may have different number of instances for different node types, class imbalance is ubiquitous. Recent studies discuss class imbalance in brief and exploit meta-path-based strategies to address the issue. However, there is no work which quantitatively study the effect of class imbalance in regards to solving real-world bibliometrics tasks. Therefore, this paper first proposes a metric to estimate class imbalance in HIN and study the effects of class imbalance over two bibliometrics tasks, namely (i) Co-authorship prediction and (ii) Author's research area classification, using node features generated by network embedding-based frameworks for DBLP dataset. From various experimental analysis, it is evident that class imbalance in bibliographic HIN is an inherent characteristic and for better performance of the above-mentioned bibliometrics tasks, the bibliographic HINs must consider Author, Paper, and Venue as node types. |
| |
Keywords: | Heterogeneous information network Network embedding Meta-path Class imbalance |
本文献已被 ScienceDirect 等数据库收录! |
|