首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Authorship disambiguation is an urgent issue that affects the quality of digital library services and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation functions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores association rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypothesis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical.  相似文献   

2.
Author disambiguation resolves same-name author occurrences in the bibliographic data into namesakes. This enables author-centered searches and high-quality social network analysis. As an attempt to promote much research in author disambiguation, KISTI have constructed a new large-scale test set for this field. This article describes its semi-manual creation procedures, characteristics especially in terms of author ambiguities and name diversities. In addition, the baseline performance of author clustering against the test set is provided.  相似文献   

3.
Hiring appropriate editors, chairs and committee members for academic journals and conferences is challenging. It requires a targeted search for high profile scholars who are active in the field as well as in the publication venue. Many author-level metrics have been employed for this task, such as the h-index, PageRank and their variants. However, these metrics are global measures which evaluate authors’ productivity and impact without differentiating the publication venues. From the perspective of a venue, it is also important to have a localised metric which can specifically indicate the significance of academic authors for the particular venue. In this paper, we propose a relevance-based author ranking algorithm to measure the significance of authors to individual venues. Specifically, we develop a co-authorship network considering the author-venue relationship which integrates the statistical relevance of authors to individual venues. The RelRank, an improved PageRank algorithm embedding author relevance, is then proposed to rank authors for each venue. Extensive experiments are carried out to analyse the proposed RelRank in comparison with classic author-level metrics on three datasets of different research domains. We also evaluate the effectiveness of the RelRank and comparison metrics in recommending editorial boards of three venues using test data. Results demonstrate that the RelRank is able to identify not only the high profile scholars but also those who are particularly significant for individual venues.  相似文献   

4.
在数字图书馆环境下,作者名歧义现象会降低文献数据库检索的准确性,影响文献数据集质量,自动化消歧方法相比于传统的方法将更有效地解决海量数据增长、人工辨识效率偏低的矛盾。在简述现有的具有代表性的作者名自动消歧方法的基础上,根据聚类方式和特征选取方式的不同,为其建立起一个较为完整的分类体系,并对其进行对比分析。然后针对文献数据库中存在的国内外作者名歧义现象,提出相应的不受限于某种数据库和语种的通用的人名消歧框架,从而为指导文献数据库系统如何应用合适的消歧方法提供技术支持。  相似文献   

5.
In this study, we propose and validate social networks based theoretical model for exploring scholars’ collaboration (co-authorship) network properties associated with their citation-based research performance (i.e., g-index). Using structural holes theory, we focus on how a scholar’s egocentric network properties of density, efficiency and constraint within the network associate with their scholarly performance. For our analysis, we use publication data of high impact factor journals in the field of “Information Science & Library Science” between 2000 and 2009, extracted from Scopus. The resulting database contained 4837 publications reflecting the contributions of 8069 authors. Results from our data analysis suggest that research performance of scholars’ is significantly correlated with scholars’ ego-network measures. In particular, scholars with more co-authors and those who exhibit higher levels of betweenness centrality (i.e., the extent to which a co-author is between another pair of co-authors) perform better in terms of research (i.e., higher g-index). Furthermore, scholars with efficient collaboration networks who maintain a strong co-authorship relationship with one primary co-author within a group of linked co-authors (i.e., co-authors that have joint publications) perform better than those researchers with many relationships to the same group of linked co-authors.  相似文献   

6.
7.
In this paper, we investigate the impact of emotions on author profiling, concretely identifying age and gender. Firstly, we propose the EmoGraph method for modelling the way people use the language to express themselves on the basis of an emotion-labelled graph. We apply this representation model for identifying gender and age in the Spanish partition of the PAN-AP-13 corpus, obtaining comparable results to the best performing systems of the PAN Lab of CLEF.  相似文献   

8.
Frequent requests from users to search engines on the World Wide Web are to search for information about people using personal names. Current search engines only return sets of documents containing the name queried, but, as several people usually share a personal name, the resulting sets often contain documents relevant to several people. It is necessary to disambiguate people in these result sets in order to to help users find the person of interest more readily. In the task of name disambiguation, effective measurement of similarities in the documents is a crucial step towards the final disambiguation. We propose a new method that uses web directories as a knowledge base to find common contexts in documents and uses the common contexts measure to determine document similarities. Experiments, conducted on documents mentioning real people on the web, together with several famous web directory structures, suggest that there are significant advantages in using web directories to disambiguate people compared with other conventional methods.  相似文献   

9.
Word sense disambiguation is important in various aspects of natural language processing, including Internet search engines, machine translation, text mining, etc. However, the traditional methods using case frames are not effective for solving context ambiguities that requires information beyond sentences. This paper presents a new scheme for solving context ambiguities using a field association scheme. Generally, the scope of case frames is restricted to one sentence; however, the scope of the field association scheme can be applied to a set of sentences. In this paper, a formal disambiguation algorithm is proposed to control the scope for a set of variable number of sentences with ambiguities as well as solve ambiguities by calculating the weight of fields. In the experiments, 52 English and 20 Chinese words are disambiguated by using 104,532 Chinese and 38,372 English field association terms. The accuracy of the proposed field association scheme for context ambiguities is 65% higher than the case frame method. The proposed scheme shows better results than other three known methods, namely UNED-LS-U, IIT-2, and Relative-based in corpus SENSEVAL-2.  相似文献   

10.
In this paper, we introduce a novel knowledge-based word-sense disambiguation (WSD) system. In particular, the main goal of our research is to find an effective way to filter out unnecessary information by using word similarity. For this, we adopt two methods in our WSD system. First, we propose a novel encoding method for word vector representation by considering the graphical semantic relationships from the lexical knowledge bases, and the word vector representation is utilized to determine the word similarity in our WSD system. Second, we present an effective method for extracting the contextual words from a text for analyzing an ambiguous word based on word similarity. The results demonstrate that the suggested methods significantly enhance the baseline WSD performance in all corpora. In particular, the performance on nouns is similar to those of the state-of-the-art knowledge-based WSD models, and the performance on verbs surpasses that of the existing knowledge-based WSD models.  相似文献   

11.
Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The system learns the similarity matrix between word pairs from the unlabeled corpus, and it uses the vector representations of sense definitions from MRD, which are derived based on the similarity matrix. In order to disambiguate all occurrences of polysemous words in a sentence, the system separately constructs the acyclic weighted digraph (AWD) for every occurrence of polysemous words in a sentence. The AWD is structured based on consideration of the senses of context words which occur with a target word in a sentence. After building the AWD per each polysemous word, we can search the optimal path of the AWD using the Viterbi algorithm. We assign the most appropriate sense to the target word in sentences with the sense on the optimal path in the AWD. By experiments, our system shows 76.4% accuracy for the semantically ambiguous Korean words.  相似文献   

12.
Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The system learns the similarity matrix between word pairs from the unlabeled corpus, and it uses the vector representations of sense definitions from MRD, which are derived based on the similarity matrix. In order to disambiguate all occurrences of polysemous words in a sentence, the system separately constructs the acyclic weighted digraph (AWD) for every occurrence of polysemous words in a sentence. The AWD is structured based on consideration of the senses of context words which occur with a target word in a sentence. After building the AWD per each polysemous word, we can search the optimal path of the AWD using the Viterbi algorithm. We assign the most appropriate sense to the target word in sentences with the sense on the optimal path in the AWD. By experiments, our system shows 76.4% accuracy for the semantically ambiguous Korean words.  相似文献   

13.
Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.  相似文献   

14.
【目的】 分析沟通行为通过作者满意度影响作者忠诚度的实现机理,为提升科技期刊作者满意度和忠诚度提供对策建议。【方法】 基于沟通行为理论构建科技期刊编辑与作者之间的沟通行为影响作者忠诚度的结构方程模型,对《中国细胞生物学学报》的作者群体进行问卷调查,通过网络问卷的方式采集数据,并运用AMOS17.0和SPSS16.0进行检验。【结果】 沟通渠道和沟通氛围虽然不能直接提升作者忠诚度,但可以通过改善作者满意度间接提升作者忠诚度;沟通效率不仅直接提升作者忠诚度,而且通过作者满意度间接提升作者忠诚度。沟通行为的三个维度具有内在逻辑联系,沟通渠道有助于改善沟通氛围,提升沟通效率。【结论】 科技期刊应努力与作者群体建立多样化的沟通渠道,不断改善科技期刊编辑与作者之间的沟通氛围,通过改善作者满意度来提升作者忠诚度。  相似文献   

15.
The name ambiguity problem is especially challenging in the field of bibliographic digital libraries. The problem is amplified when names are collected from heterogeneous sources. This is the case in the Scholarometer system, which performs bibliometric analysis by cross-correlating author names in user queries with those retrieved from digital libraries. The uncontrolled nature of user-generated annotations is very valuable, but creates the need to detect ambiguous names. Our goal is to detect ambiguous names at query time by mining digital library annotation data, thereby decreasing noise in the bibliometric analysis. We explore three kinds of heuristic features based on citations, metadata, and crowdsourced topics in a supervised learning framework. The proposed approach achieves almost 80% accuracy. Finally, we compare the performance of ambiguous author detection in Scholarometer using Google Scholar against a baseline based on Microsoft Academic Search.  相似文献   

16.
论域名的知识产权保护   总被引:2,自引:0,他引:2  
陈敬全 《情报科学》2000,18(12):1110-1112
本文在具体分析域名的特征和知识产权属性的基础上,对如何加强域名的知识产权管理进行了深入探讨。并在分析国外有关域名纠纷的典型判例的基础上,提出了解决域名纠纷的法律途径。  相似文献   

17.
基于作者群分析的科技期刊核心竞争力提升方法探索   总被引:2,自引:1,他引:1  
【目的】 基于期刊市场竞争激烈的现状,探寻提升科技期刊核心竞争力的方法。【方法】 从作者群的角度出发,讨论分析作者群的方法,并结合《分析测试学报》的实际情况,阐述期刊编辑对作者群的构建与管理之道,探索提升科技期刊核心竞争力的具体举措。【结果】 结合期刊的实际情况,利用作者群,通过采取切实可行的措施可以有效提升科技期刊的核心竞争力。【结论】 扩大作者群、吸引高质量稿源、编辑与作者保持有效沟通是提高科技期刊核心竞争力的有效方法。  相似文献   

18.
Word sense ambiguity has been identified as a cause of poor precision in information retrieval (IR) systems. Word sense disambiguation and discrimination methods have been defined to help systems choose which documents should be retrieved in relation to an ambiguous query. However, the only approaches that show a genuine benefit for word sense discrimination or disambiguation in IR are generally supervised ones. In this paper we propose a new unsupervised method that uses word sense discrimination in IR. The method we develop is based on spectral clustering and reorders an initially retrieved document list by boosting documents that are semantically similar to the target query. For several TREC ad hoc collections we show that our method is useful in the case of queries which contain ambiguous terms. We are interested in improving the level of precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30) respectively. We show that precision can be improved by 8% above current state-of-the-art baselines. We also focus on poor performing queries.  相似文献   

19.
In this paper, we face the so called “ranked list problem” of Web searches, that occurs when users submit short requests to search engines. Generally, as a consequence of terms’ ambiguity and polysemy, users engage long cycles of query reformulation in an attempt to capture relevant information in the top ranked results.  相似文献   

20.
“Scientific and technical human capital” (S&T human capital) has been defined as the sum of researchers’ professional network ties and their technical skills and resources [Int. J. Technol. Manage. 22 (7-8) (2001) 636]. Our study focuses on one particular means by which scientists acquire and deploy S&T human capital, research collaboration. We examine data from 451 scientists and engineers at academic research centers in the United States. The chief focus is on scientists’ collaboration choices and strategies. Since we are particularly interested in S&T human capital, we pay special attention to strategies that involve mentoring graduate students and junior faculty and to collaborating with women. We also examine collaboration “cosmopolitanism,” the extent to which scientists collaborate with those around them (one’s research group, one’s university) as opposed to those more distant in geography or institutional setting (other universities, researchers in industry, researchers in other nations). Our findings indicate that those who pursue a “mentor” collaboration strategy are likely to be tenured; to collaborate with women; and to have a favorable view about industry and research on industrial applications. Regarding the number of reported collaborators, those who have larger grants have more collaborators. With respect to the percentage of female collaborators, we found, not surprisingly, that female scientists have a somewhat higher percentage (36%) of female collaborators, than males have (24%). There are great differences, however, according to rank, with non-tenure track females having 84% of their collaborations with females. Regarding collaboration cosmopolitanism, we find that most researchers are not particularly cosmopolitan in their selection of collaborators—they tend to work with the people in their own work group. More cosmopolitan collaborators tend have large grants. A major policy implication is that there is great variance in the extent to which collaborations seem to enhance or generate S&T human capital. Not all collaborations are equal with respect to their “public goods” implications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号