首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Author name disambiguation deals with clustering the same-name authors into different individuals. To attack the problem, many studies have employed a variety of disambiguation features such as coauthors, titles of papers/publications, topics of articles, emails/affiliations, etc. Among these, co-authorship is the most easily accessible and influential, since inter-person acquaintances represented by co-authorship could discriminate the identities of authors more clearly than other features. This study attempts to explore the net effects of co-authorship on author clustering in bibliographic data. First, to handle the shortage of explicit coauthors listed in known citations, a web-assisted technique of acquiring implicit coauthors of the target author to be disambiguated is proposed. Then, a coauthor disambiguation hypothesis that the identity of an author can be determined by his/her coauthors is examined and confirmed through a variety of author disambiguation experiments.  相似文献   

Authorship disambiguation is an urgent issue that affects the quality of digital library services and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation functions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores association rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypothesis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical.  相似文献   

在数字图书馆环境下,作者名歧义现象会降低文献数据库检索的准确性,影响文献数据集质量,自动化消歧方法相比于传统的方法将更有效地解决海量数据增长、人工辨识效率偏低的矛盾。在简述现有的具有代表性的作者名自动消歧方法的基础上,根据聚类方式和特征选取方式的不同,为其建立起一个较为完整的分类体系,并对其进行对比分析。然后针对文献数据库中存在的国内外作者名歧义现象,提出相应的不受限于某种数据库和语种的通用的人名消歧框架,从而为指导文献数据库系统如何应用合适的消歧方法提供技术支持。  相似文献   

Frequent requests from users to search engines on the World Wide Web are to search for information about people using personal names. Current search engines only return sets of documents containing the name queried, but, as several people usually share a personal name, the resulting sets often contain documents relevant to several people. It is necessary to disambiguate people in these result sets in order to to help users find the person of interest more readily. In the task of name disambiguation, effective measurement of similarities in the documents is a crucial step towards the final disambiguation. We propose a new method that uses web directories as a knowledge base to find common contexts in documents and uses the common contexts measure to determine document similarities. Experiments, conducted on documents mentioning real people on the web, together with several famous web directory structures, suggest that there are significant advantages in using web directories to disambiguate people compared with other conventional methods.  相似文献   

Word sense disambiguation is important in various aspects of natural language processing, including Internet search engines, machine translation, text mining, etc. However, the traditional methods using case frames are not effective for solving context ambiguities that requires information beyond sentences. This paper presents a new scheme for solving context ambiguities using a field association scheme. Generally, the scope of case frames is restricted to one sentence; however, the scope of the field association scheme can be applied to a set of sentences. In this paper, a formal disambiguation algorithm is proposed to control the scope for a set of variable number of sentences with ambiguities as well as solve ambiguities by calculating the weight of fields. In the experiments, 52 English and 20 Chinese words are disambiguated by using 104,532 Chinese and 38,372 English field association terms. The accuracy of the proposed field association scheme for context ambiguities is 65% higher than the case frame method. The proposed scheme shows better results than other three known methods, namely UNED-LS-U, IIT-2, and Relative-based in corpus SENSEVAL-2.  相似文献   

【目的】 分析沟通行为通过作者满意度影响作者忠诚度的实现机理,为提升科技期刊作者满意度和忠诚度提供对策建议。【方法】 基于沟通行为理论构建科技期刊编辑与作者之间的沟通行为影响作者忠诚度的结构方程模型,对《中国细胞生物学学报》的作者群体进行问卷调查,通过网络问卷的方式采集数据,并运用AMOS17.0和SPSS16.0进行检验。【结果】 沟通渠道和沟通氛围虽然不能直接提升作者忠诚度,但可以通过改善作者满意度间接提升作者忠诚度;沟通效率不仅直接提升作者忠诚度,而且通过作者满意度间接提升作者忠诚度。沟通行为的三个维度具有内在逻辑联系,沟通渠道有助于改善沟通氛围,提升沟通效率。【结论】 科技期刊应努力与作者群体建立多样化的沟通渠道,不断改善科技期刊编辑与作者之间的沟通氛围,通过改善作者满意度来提升作者忠诚度。  相似文献   

Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.  相似文献   

Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The system learns the similarity matrix between word pairs from the unlabeled corpus, and it uses the vector representations of sense definitions from MRD, which are derived based on the similarity matrix. In order to disambiguate all occurrences of polysemous words in a sentence, the system separately constructs the acyclic weighted digraph (AWD) for every occurrence of polysemous words in a sentence. The AWD is structured based on consideration of the senses of context words which occur with a target word in a sentence. After building the AWD per each polysemous word, we can search the optimal path of the AWD using the Viterbi algorithm. We assign the most appropriate sense to the target word in sentences with the sense on the optimal path in the AWD. By experiments, our system shows 76.4% accuracy for the semantically ambiguous Korean words.  相似文献   

Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The system learns the similarity matrix between word pairs from the unlabeled corpus, and it uses the vector representations of sense definitions from MRD, which are derived based on the similarity matrix. In order to disambiguate all occurrences of polysemous words in a sentence, the system separately constructs the acyclic weighted digraph (AWD) for every occurrence of polysemous words in a sentence. The AWD is structured based on consideration of the senses of context words which occur with a target word in a sentence. After building the AWD per each polysemous word, we can search the optimal path of the AWD using the Viterbi algorithm. We assign the most appropriate sense to the target word in sentences with the sense on the optimal path in the AWD. By experiments, our system shows 76.4% accuracy for the semantically ambiguous Korean words.  相似文献   

Hiring appropriate editors, chairs and committee members for academic journals and conferences is challenging. It requires a targeted search for high profile scholars who are active in the field as well as in the publication venue. Many author-level metrics have been employed for this task, such as the h-index, PageRank and their variants. However, these metrics are global measures which evaluate authors’ productivity and impact without differentiating the publication venues. From the perspective of a venue, it is also important to have a localised metric which can specifically indicate the significance of academic authors for the particular venue. In this paper, we propose a relevance-based author ranking algorithm to measure the significance of authors to individual venues. Specifically, we develop a co-authorship network considering the author-venue relationship which integrates the statistical relevance of authors to individual venues. The RelRank, an improved PageRank algorithm embedding author relevance, is then proposed to rank authors for each venue. Extensive experiments are carried out to analyse the proposed RelRank in comparison with classic author-level metrics on three datasets of different research domains. We also evaluate the effectiveness of the RelRank and comparison metrics in recommending editorial boards of three venues using test data. Results demonstrate that the RelRank is able to identify not only the high profile scholars but also those who are particularly significant for individual venues.  相似文献   

Classical test theory offers theoretically derived reliability measures such as Cronbach’s alpha, which can be applied to measure the reliability of a set of Information Retrieval test results. The theory also supports item analysis, which identifies queries that are hampering the test’s reliability, and which may be candidates for refinement or removal. A generalization of Classical Test Theory, called Generalizability Theory, provides an even richer set of tools. It allows us to estimate the reliability of a test as a function of the number of queries, assessors (relevance judges), and other aspects of the test’s design. One novel aspect of Generalizability Theory is that it allows this estimation of reliability even before the test collection exists, based purely on the numbers of queries and assessors that it will contain. These calculations can help test designers in advance, by allowing them to compare the reliability of test designs with various numbers of queries and relevance assessors, and to spend their limited budgets on a design that maximizes reliability. Empirical analysis shows that in cases for which our data is representative, having more queries is more helpful for reliability than having more assessors. It also suggests that reliability may be improved with a per-document performance measure, as opposed to a document-set based performance measure, where appropriate. The theory also clarifies the implicit debate in IR literature regarding the nature of error in relevance judgments.  相似文献   

In this paper, we investigate the impact of emotions on author profiling, concretely identifying age and gender. Firstly, we propose the EmoGraph method for modelling the way people use the language to express themselves on the basis of an emotion-labelled graph. We apply this representation model for identifying gender and age in the Spanish partition of the PAN-AP-13 corpus, obtaining comparable results to the best performing systems of the PAN Lab of CLEF.  相似文献   

Engineering a multi-purpose test collection for Web retrieval experiments   总被引:1,自引:0,他引:1  
Past research into text retrieval methods for the Web has been restricted by the lack of a test collection capable of supporting experiments which are both realistic and reproducible. The 1.69 million document WT10g collection is proposed as a multi-purpose testbed for experiments with these attributes, in distributed IR, hyperlink algorithms and conventional ad hoc retrieval.WT10g was constructed by selecting from a superset of documents in such a way that desirable corpus properties were preserved or optimised. These properties include: a high degree of inter-server connectivity, integrity of server holdings, inclusion of documents related to a very wide spread of likely queries, and a realistic distribution of server holding sizes. We confirm that WT10g contains exploitable link information using a site (homepage) finding experiment. Our results show that, on this task, Okapi BM25 works better on propagated link anchor text than on full text.WT10g was used in TREC-9 and TREC-2000 and both topic relevance and homepage finding queries and judgments are available.  相似文献   

基于云模型的建筑工程质量等级评价   总被引:1,自引:0,他引:1  
为了更好地处理建筑工程质量等级评价过程中的模糊性,本文利用云模型在不确定性转换上的优势,探索了建筑工程质量等级评价一种新方法,并通过工程实例对该方法进行了实证研究。  相似文献   

Traditional Cranfield test collections represent an abstraction of a retrieval task that Sparck Jones calls the “core competency” of retrieval: a task that is necessary, but not sufficient, for user retrieval tasks. The abstraction facilitates research by controlling for (some) sources of variability, thus increasing the power of experiments that compare system effectiveness while reducing their cost. However, even within the highly-abstracted case of the Cranfield paradigm, meta-analysis demonstrates that the user/topic effect is greater than the system effect, so experiments must include a relatively large number of topics to distinguish systems’ effectiveness. The evidence further suggests that changing the abstraction slightly to include just a bit more characterization of the user will result in a dramatic loss of power or increase in cost of retrieval experiments. Defining a new, feasible abstraction for supporting adaptive IR research will require winnowing the list of all possible factors that can affect retrieval behavior to a minimum number of essential factors.  相似文献   

随着建筑业对绿色环保和信息化水平的要求越来越高,传统的施工项目评价难以满足建筑业可持续发展的需要。利用扎根理论,在传统评价的基础上增加绿色施工和信息化评价,构建8个维度的施工项目绿色评价指标体系;通过问卷调查和改进的群组G1赋权法对指标权重进行合理分配;最后结合集对分析计算指标等级联系度,建立面向施工项目的绿色评价模型,并通过工程实例验证了模型的有效性。  相似文献   

蔡天佑 《科教文汇》2013,(19):18-19
电视媒体作为自收自支的事业单位,承担着舆论宣传和自我发展的多重任务,在新的竞争环境下,应建立的科学精神文明建设的组织架构和工作方法,以人为本,实行"大政工"运作,提升采编人员的思想政治素质。  相似文献   

这篇文章利用位势井族理论,研究了整体解的不变性,进而研究了解的真空隔离现象。在本文中将真空隔离现象的条件进行了推广和改进,得出了新的结论。这些结论对研究Sobolev空间中解的分布情况有很大帮助。  相似文献   

This article proposes an approach to construct a Lyapunov function for a linear large-scale periodic system. In this case, in contrast to various variants of small-gain stability conditions for large-scale systems, the presence of the asymptotic stability property of independent subsystems is not assumed. To analyze the asymptotic stability of a large-scale system, the direct Lyapunov method is used in combination with the discretization method and identities of the commutator calculus. The main results are illustrated by means of examples.  相似文献   

以建筑企业员工对精益建设技术的采纳意愿为研究对象,通过调查问卷获得精益建设技术采纳意愿影响因素的数据,并利用粗糙集筛选出其中的重要影响因素;然后以技术接受模型为基础,借助系统动力学工具构建影响因素与采纳意愿的仿真模型,利用BP神经网络结合MIV算法来确定模型中的参数,并通过计算机仿真对比分析这些影响因素在不同技术成熟度下对采纳意愿的影响程度,探析影响因素与采纳意愿之间的作用规律。研究结果表明,这些影响因素对员工采纳意愿作用明显,其中精益建设技术感知有用性最显著,其次是精益建设技术感知易用性,而精益建设技术感知愉悦性的作用相对较弱。最后,就提高员工对精益技术的采纳意愿提出建议。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号