首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 62 毫秒
1.
昌宁  窦永香  徐薇 《情报科学》2021,39(6):108-116
【目的/意义】本文利用多源数据,通过对科技文献作者的名称进行消歧,使作者与科技文献呈一一对应的 关系。【方法/过程】本文提出首先将采集的多源数据进行预处理,形成了同一姓名作者文献组成的待消解的重名数 据集,通过合作关系构建学术圈以发现歧义,最后通过机构和领域进行消歧。【结果/结论】实验采集了各级教育、自 动化及计算机技术、信息与知识传播、数理科学和化学、无线电电子学、中国医学等6个不同的学科的文献题录数 据,本文提出的基于规则的消歧具有良好的消歧效果。通过多源数据融合、机构和领域多指标消歧,能够达到较高 的消歧效果。【创新/局限】解决了同机构同领域消歧的难题,并考虑了增量问题,构建了完整的消歧模型。  相似文献   

2.
文献著者消歧是人名消歧的一种,近年来引起了学术界的广泛关注。其中,文献聚类方法是文献著者消歧的重要方法,但其实验效果往往不佳。基于此,对文本聚类K—means方法进行改进,并在此基础上来实现文献著者消歧。实验结果表明,改进的K—means算法能有效提高文献著者消歧的实验效果。  相似文献   

3.
孙笑明  李瑶  王成军  刘斌  赵升 《情报科学》2019,37(4):116-121
【目的/意义】为了实现高质量的数据清洗目标以提高专利大数据的利用效率,发明人姓名消歧成为了目前 一个亟待解决的关键性问题。【方法/过程】本文提出了基于专家研讨思想的发明人姓名消歧算法,即首先根据综合 相似度阈值将消歧过程中产生的发明人姓名歧义分为确定性歧义和非确定性歧义;然后对确定性歧义直接修正, 同时,引入专家研讨思想,通过群体智慧将非确定性歧义转化为确定性歧义进行消歧。【结果/结论】以国内医药行 业专利数据为实例的分析表明,与以往单纯的机器消歧算法相比,该消歧算法从准确率和消歧时间两个维度均具 有显著改进。  相似文献   

4.
为了实现高质量的数据清洗,提升专利数据构建网络的准确性,发明人的姓名消歧已经成为目前国内外众多研究者重视的关键性问题。本文根据中文姓名的特殊性,选取专利数据中分层抽样采集到的400个姓名对,使用半监督学习算法,以特征向量(如分类号相似度)为信息提取源,构造基于决策树C4.5算法的分类模型,识别姓名歧义问题,并对分类模型的准确率与可靠性进行了评估。以国内通讯行业专利数据为实例的研究表明:采用该分类模型进行清洗能够有效提升数据清洗的效率和精确度。  相似文献   

5.
词义消歧是自然语言处理中的一个核心问题,尝试了基于单纯贝叶斯概率模型的消歧方法,取得了好的效果。由于该方法在抽取上下文特征时没有进行合理的选择,致使一些无用的信息混入其中降低了贝叶斯分类器的分类准确率。利用词根词性提高了上下文特征抽取的有效性,并且尝试寻找上下文中的指示词这种特征进行消歧。  相似文献   

6.
词义消歧是自然语言处理中的一个核心问题,尝试了基于单纯贝叶新概率模型的消歧方法,取得了好的效果。由于该方法在抽取上下文特征时没有进行合理的选择,致使一些无用的信息混入其中降低了贝叶斯分类器的分类准确率。利用词根词性提高了上下文特征抽取的有效性,并且尝试寻找上下文中的指示词这种特征进行消歧。  相似文献   

7.
在文本中,常常出现一词多义的现象,本文提出一种基于语义关系图的词义消歧算法,算法首先利用Word Net的语义关系构建语义关系图;其次,通过多义词在语义关系图的上下文选择最佳语义关系。测试用Senseval-3中的全文内容作为实验测试集,结果表明,词义消歧算法的测试结果很理想。  相似文献   

8.
介绍了一种新的基于汉语篇章结构的自动摘要方法.在文本物理结构的基础上,利用汉语复句研究理论、RST理论和各种汉语语言特征的融合方法对文本内容进行了深入的分析,确定了文本的各层次语言单元之间的逻辑关系,得到了文本的逻辑结构.经过加权规则抽取文摘,并通过消歧规则使文摘连贯流畅,最后给出了系统测评.  相似文献   

9.
介绍了一种新的基于汉语篇章结构的自动方法。在文本物理结构的基础上,利用汉语复句研究理论、RST理论和各种汉语语言特征的融合方法对文本内容进行了深入的分析,确定了文本的各层次语言单元之间的逻辑关系,得到了文本的逻辑结构。经过加权规则抽取文摘,并通过消歧规则使文摘连贯流畅,最后给出了系统测评。  相似文献   

10.
本文介绍了一种机器翻译框架,能够完成汉-英文本的自动翻译任务。对于输入句子,分别进行分词、词性标注和句法分析处理。在翻译转换之前,集成了词义消歧的结果以提高自动译文输出质量。  相似文献   

11.
Author name disambiguation deals with clustering the same-name authors into different individuals. To attack the problem, many studies have employed a variety of disambiguation features such as coauthors, titles of papers/publications, topics of articles, emails/affiliations, etc. Among these, co-authorship is the most easily accessible and influential, since inter-person acquaintances represented by co-authorship could discriminate the identities of authors more clearly than other features. This study attempts to explore the net effects of co-authorship on author clustering in bibliographic data. First, to handle the shortage of explicit coauthors listed in known citations, a web-assisted technique of acquiring implicit coauthors of the target author to be disambiguated is proposed. Then, a coauthor disambiguation hypothesis that the identity of an author can be determined by his/her coauthors is examined and confirmed through a variety of author disambiguation experiments.  相似文献   

12.
Authorship disambiguation is an urgent issue that affects the quality of digital library services and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation functions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores association rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypothesis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical.  相似文献   

13.
叶恒一 《情报科学》2002,20(1):78-80
本文对域名及域名蝗现状作了初步分析,在此基础上,进一步分析数字图书馆的域名体系结构,提出规范数字图书馆域名体系的重要性,并对数字图书馆可能出现的域名法律问题进行了探讨。  相似文献   

14.
杨超 《科教文汇》2014,(29):214-216
企业名称与自然人姓名的交叉,必然引发企业名称权与姓名权之间的权利冲突。要规避这种冲突必须厘清这两种权利的法律性质,明确它们各自的权利边界和行使特点,然后有针对性地完善立法,积极预防和处理相关权利冲突。  相似文献   

15.
蔡卫平 《情报科学》2003,21(3):291-292,312
从全球的角度,对最近几年互联网域名系统发展过程中因域名的注册和使用而引起的知识产权纠纷的解决进程作了比较系统的介绍,重点介绍和分析了WIPO第二次域名问题全球性磋商和目前该领域的几个热点问题,供有关人士参考。  相似文献   

16.
王永囡 《科教文汇》2013,(16):114-115
本文以合肥市4个区域的150个楼盘名称作为研究对象,按照楼盘名称组合形式、楼盘通名和专名的语义语用特点,从社会语言学的角度,对楼盘名称的语言应用技巧进行考察和分析。  相似文献   

17.
Author disambiguation resolves same-name author occurrences in the bibliographic data into namesakes. This enables author-centered searches and high-quality social network analysis. As an attempt to promote much research in author disambiguation, KISTI have constructed a new large-scale test set for this field. This article describes its semi-manual creation procedures, characteristics especially in terms of author ambiguities and name diversities. In addition, the baseline performance of author clustering against the test set is provided.  相似文献   

18.
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.  相似文献   

19.
论域名的知识产权保护   总被引:2,自引:0,他引:2  
陈敬全 《情报科学》2000,18(12):1110-1112
本文在具体分析域名的特征和知识产权属性的基础上,对如何加强域名的知识产权管理进行了深入探讨。并在分析国外有关域名纠纷的典型判例的基础上,提出了解决域名纠纷的法律途径。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号