首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
考察特定领域文本中蕴含的细粒度知识实体的使用情况,对知识实体的评估和选择具有重要意义。学术文本中的细粒度知识实体通常具有多个类型、多种关联关系,挖掘知识实体的同质与异质关联关系,有助于深入了解特定领域知识实体的实际使用情况。目前相关研究大多针对学术文本中单一知识实体的抽取和评估,缺乏对知识实体间关系的关注,在一定程度上限制了基于实体抽取进行知识发现的能力。文章以自然语言处理领域为例,对学术论文全文中的细粒度知识实体关联数据进行挖掘,并通过可视化方式揭示关联数据中蕴含的信息。主要是选取全国计算语言学会议2009-2018年间收录的中文论文为原始语料,人工标注论文中使用的知识实体,并针对NLP特点将其细分为“指标实体”“工具实体”“资源实体”“方法实体”4种类型;结合关联规则挖掘算法Apriori和复杂网络分析软件构建知识实体关联网络,揭示该领域常用的知识实体,以及这些知识实体的使用相关性。  相似文献   

2.
鉴于重要关键词对于文本有着重要的强文本表示功能,关键词抽取和筛选在信息检索、信息抽取和知识挖掘等领域中有着重要的作用。在调研当前关键词抽取的方法后,结合医学领域已有的叙词表和工具以及BM25F加权词频公式提出基于医学文本的重要关键词抽取和筛选的技术方法。该方法主要解决两个关键问题:关键词的识别和抽取、关键词重要性的衡量和筛选。以2001-2007年骨关节炎领域的文献集合为数据来源,对该技术方法进行实践尝试,并验证其实际有效性,为知识挖掘中的重要关键词抽取提供一个行之有效的途径。  相似文献   

3.
A machine learning approach to sentiment analysis in multilingual Web texts   总被引:1,自引:0,他引:1  
Sentiment analysis, also called opinion mining, is a form of information extraction from text of growing research and commercial interest. In this paper we present our machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French. We train from a set of example sentences or statements that are manually annotated as positive, negative or neutral with regard to a certain entity. We are interested in the feelings that people express with regard to certain consumption products. We learn and evaluate several classification models that can be configured in a cascaded pipeline. We have to deal with several problems, being the noisy character of the input texts, the attribution of the sentiment to a particular entity and the small size of the training set. We succeed to identify positive, negative and neutral feelings to the entity under consideration with ca. 83% accuracy for English texts based on unigram features augmented with linguistic features. The accuracy results of processing the Dutch and French texts are ca. 70 and 68% respectively due to the larger variety of the linguistic expressions that more often diverge from standard language, thus demanding more training patterns. In addition, our experiments give us insights into the portability of the learned models across domains and languages. A substantial part of the article investigates the role of active learning techniques for reducing the number of examples to be manually annotated.  相似文献   

4.
《图书馆管理杂志》2013,53(3-4):351-378
Summary

The objective of this article is to examine some of the valuable philosophical resources available on the Internet, especially for librarians who need to determine what resources to provide to faculty, students, and staff. Some main uses of the Internet for philosophical research are: accessing texts, using search strategies to examine the texts, reading and writing electronic journal articles, accessing information from encyclopedias and dictionaries, browsing through paths of interlinked Web sites, searching the Internet for sources, and participating in online discussion forums. This article examines the Internet as a source of primary and secondary texts, journal literature, research databases, reference works, specialized limited area search engines, organizational information, and discussion lists. The article ends with a philosophical reflection on the transformative effect of information technology on the growth of knowledge.  相似文献   

5.
作为主题图在具体行业领域的开发应用,文章在其上篇“主题图的概况及应用研究”的基础上,介绍基于主题图的金融培训机构知识系统的构建,包括金融行业培训机构的需求分析、基于主题图技术的金融培训机构知识系统的架构设计、系统核心部分的知识分类(文章称作知识地图)的设计与构建,以及结论和展望。  相似文献   

6.
Main path analysis (MPA) is the most widely accepted approach to tracing knowledge transfer in a research field. In this study, we extracted multiple longest paths from the multidisciplinary academic field's citation network and integrating topic modeling to the extracted paths. We consider three main aspects of trajectory analysis when analyzing the represented documents through the extracted paths: emergence, authority, and topic dynamics. For path extraction, we adopt the longest path algorithm that consists of the following three steps: 1) topological sort, 2) edge relaxation, and 3) multiple path extraction. For topic integration into multiple paths, we employ latent Dirichlet allocation (LDA) by utilizing the topic-document matrix that LDA derives to select an article's topic from the citation network, where each article is labeled with the topic that is assigned with the highest topical probability for that article. We conduct a series of experiments to examine the results on a dataset from the field of healthcare informatics that PubMed provides.  相似文献   

7.
针对目前适用于中文文本非等级关系提取方法偏少以及关联规则筛选方法忽略了集中出现在部分文本集中的领域词汇关系的问题,通过对中文文本的统计分析,尝试定义一套中文非等级关系提取的规则,同时提出一种加入平均值变量的改进的关联规则。实践证明,基于自定义的语法规则提取方法能够有效地从中文文本中提取出主、谓、宾语,进而提取出非等级关系,改进的关联规则方法能够提取出集中出现在部分文本集中的领域词汇非等级关系。  相似文献   

8.
A new line of investigation that integrates studies on artificial intelligence and Internet technologies, which is known as the Semantic Web, is presented. A review of the present state of research is given; problems on the establishment of knowledge spaces on the Internet, the means and methods for the extraction of knowledge from texts in natural languages, as well as questions on the use of knowledge spaces in the creation of applied intelligent systems operating on the Internet, are considered.  相似文献   

9.
信息抽取技术及其在数字图书馆中的应用前景分析   总被引:18,自引:1,他引:18  
信息抽取的目标是自动从文本信息中抽取出预先想要得到的信息(知识) , 它提供了一条从浩瀚的信息堆积中抽取出与用户相关的信息的一条思路。文章分析了信息抽取的主要概念、主要研究活动、信息抽取的类型和信息抽取系统的一般结构, 并提出在数字图书馆的建设中, 信息抽取技术能够在数字内容的自动标引、元数据获取、数据挖掘、情报研究分析、大型知识库数值库建设、参考咨询等方面发挥重要的作用。  相似文献   

10.
《Knowledge Acquisition》1991,3(3):317-337
In this paper, we describe a procedure that integrates several techniques for recognizing causal relationships in expository text. Applying these techniques yields a knowledge representation consisting of classifications of the causal relationships contained in a text. This procedure is very robust. If any one of the techniques for recognizing a causal relationship fails, an alternate methodology can be used to continue the causal analysis. The procedure we will describe is embodied in a program called the causal analyser. We have applied the causal analyser to several texts to produce a representation of the causal relationships in these texts. The causal analyser described in this paper will be part of a knowledge acquisition system called TAKT (tool for the acquisition of knowledge from text), which is currently being developed.  相似文献   

11.
网络招聘文本技能信息自动抽取研究   总被引:1,自引:1,他引:0  
[目的/意义]针对目前网络招聘文本手工抽取技能信息无法满足大数据量分析要求的问题,提出一种针对大量网络招聘文本的技能信息自动抽取方法。[方法/过程]根据网络招聘文本的特点,利用依存句法分析选取候选技能,然后提出领域相关性指标衡量候选技能,将其融入传统的术语抽取方法之中,形成一种网络招聘文本技能信息自动抽取方法。[结果/结论]实验表明,本文提出的方法能够从网络招聘文本中自动、快速、准确地抽取技能信息。  相似文献   

12.
[目的/意义]基于网络招聘文本和学科数据,提出"行业-岗位-知识-学科"的人才需求及供给分析框架,以人工智能领域为例进行挖掘与分析,同时对其他领域的人才供需分析也具有借鉴意义。[方法/过程]采集招聘网站中与人工智能相关的职位招聘公告,综合对比CRF、BiLSTM-CRF、BERT-BiLSTM-CRF、BERT模型对招聘文本的实体抽取效果,并运用社会网络分析方法与学科数据进行关联分析。[结果/结论]BERT-BiLSTM-CRF实体抽取实验效果最佳,分别构建"行业-岗位""岗位-知识"以及"知识-学科"3种关系网络,得到与人工智能领域联系最紧密的行业、岗位、知识及学科。该框架能充分地挖掘人才需求现状,并能较精准地将需求定位到人才培养的学科,对于国家发展战略以及高等院校人才培养计划的制订具有现实意义。  相似文献   

13.
知识图谱研究进展   总被引:1,自引:0,他引:1       下载免费PDF全文
漆桂林  高桓  吴天星 《情报工程》2017,3(1):004-025
随着大数据时代的到来,知识工程受到了广泛关注,如何从海量的数据中提取有用的知识,是大数据分析的关键。知识图谱技术提供了一种从海量文本和图像中抽取结构化知识的手段,从而具有广阔的应用前景。本文首先简要回顾知识图谱的历史,探讨知识图谱研究的意义。其次,介绍知识图谱构建的关键技术,包括实体关系识别技术、知识融合技术、实体链接技术和知识推理技术等。然后,给出现有开放的知识图谱数据集的介绍。最后,给出知识图谱在情报分析中的应用案例。  相似文献   

14.
“十一五”期间我国文献情报领域知识发现研究综述   总被引:1,自引:0,他引:1  
对近年来关于知识发现的大量相关论文从概念关系辨析、知识发现方法体系、文本挖掘与文本趋势挖掘、非相关文献知识发现、数据挖掘研究拓展等方面开展研究,总结“十一五”期间我国文献情报领域知识发现研究成果,重点介绍有关知识发现的内容分析、关联理论、领域驱动、可视化、文本挖掘模型等研究进展,最后分析展望今后该研究领域的研究热点和研究方向。  相似文献   

15.
This article demonstrates the applicability of classification theory to various textual-analytic approaches such as grounded theory, content analysis, discourse analysis, and conversation analysis/membership categorization analysis. This applicability is based on three factors: extant and elicited texts can be broken down into categories that are essentially classification systems created and defined by the researcher; extant texts are themselves explicit or implicit classification systems; and classificatory frameworks can be applied to extant and elicited texts “in order to clarify their contribution to processes of meaning-making” (Fairclough, N. (2003). Analysing discourse: Textual analysis for social research. London: Routledge, p. 11). The recommendation is made that classification theory should be incorporated in the teaching of textual-analytic approaches in university-level research-methods courses, especially in the field of library and information science (LIS).  相似文献   

16.
Hong Kong has always been regarded as a critical region of Cultural China. Surprisingly, traditional Chinese medicine has not yet been accepted as legitimate in the city. This study uses acupuncture as a case to investigate the way media texts work to organize a field of knowledge and practices about health in a post-colonial society where contrasting perspectives and hybrid ideas rooted from the East and the West intermingle. Acupuncture is conceptualized as socially constructed health knowledge that has become increasingly legitimate in media discourse. Through a mixed-method approach that combines discourse and content analysis, a total of 666 news articles related to acupuncture published in two Hong Kong newspapers over a 10-year period were analyzed. Three major forms of discursive construction of legitimation – authorization, rationalization, and moral evaluation – were identified and elaborated in association with the texts and the social contexts. This study reveals a complex process of generating legitimacy for health knowledge through news narratives.  相似文献   

17.
基于文本挖掘机制的区域经济关系分析   总被引:1,自引:0,他引:1  
已有的经济关系研究大都采用实证的或单纯的计量学的方法来实现的.本文则针对非结构化的文本特点,采用信息抽取和文本挖掘方法挖掘用户感兴趣的区域经济关系是具有十分重大应用价值的研究课题.本文在探讨了基于实体关系的文本挖掘机制的基础上,对31个省、市、自治区的区域经济关系进行了分析.运用文本挖掘技术对经济关系的挖掘包括两种方式:一是基于属性的经济关系挖掘,利用信息抽取获取各个实体属性,采用聚类方法分析经济实体关系;二是基于相互引用的经济关系挖掘,首先构造经济实体关系分类词典,提出了实体关系标注算法,利用信息抽取获得实体之间的引用情况,然后构造关系有向图,从中挖掘区域经济之间的关系.研究表明,运用文本挖掘技术,既可以对各个区域经济发展状况进行分析和评价,也可以发现特定区域经济之间的内在关系.  相似文献   

18.
The authors discuss the problem of distributed knowledge acquisition for the construction of complete and consistent databases in integrated expert systems via the sharing of knowledge sources of different topologies (experts, problem-oriented texts, and electronic media in the form of databases). The emphasis is on the models, methods, and algorithms of distributed knowledge acquisition from databases as additional knowledge sources. The authors describe the architecture and basic facilities of distributed knowledge acquisition, which function as a part of the AT-TECHNOLOGY tool complex.  相似文献   

19.
关于从MEDLINE数据库中进行知识抽取和挖掘的研究进展   总被引:28,自引:4,他引:24  
崔雷  郑华川 《情报学报》2003,22(4):425-433
本文对近年来国内外利用医学文献检索系统MEDLINE进行知识抽取和文本数据挖掘的研究进行了回顾和综述,包括Swanson等开展的从文献中发现隐藏的联系的研究,Cimino等人开展的从文献中抽取规则的研究,国外的共词及国内的共篇分析研究.并据此提出,在当前信息技术高速发展的条件下,应当充分开展知识抽取和文本挖掘的研究,为图书情报部门的服务功能从文献管理向信息管理和知识管理转化进行理论上的探索.  相似文献   

20.
[目的/意义]以汽车论坛例,提出一种针对专业社交媒体文本的主题知识元抽取方法。[方法/过程]首先,通过LDA模型提取出汽车论坛中文本的主题,并进行去重,形成主题列表;其次,基于融合主题特征的深度学习模型T-LSTM模型构建适于汽车论坛本文的情感分析模型;然后,通过计算各词汇在图模型TextRank中的重要性与各词汇的Word2Vec主题相似度,抽取情感关键词与关键句,用于对文本主题与情感倾向的解释与补充;最后,对上述方法进行集成,输出结构化的主题知识元。[结果/结论]实验结果中,抽取得到的主题知识元合格率达到69.1%,表明本文提出的主题知识元抽取方法,能够围绕知识主题较为准确地抽取知识元,实现知识的结构化转换。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号