共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a novel query expansion method, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query. Our approach makes use of both the sentence-to-sentence relations and the sentence-to-word relations to select the query biased informative words from the document set and use them as query expansions to improve the sentence ranking result. Compared to previous query expansion approaches, our approach can capture more relevant information with less noise. We performed experiments on the data of document understanding conference (DUC) 2005 and DUC 2006, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems. 相似文献
2.
《Information processing & management》2023,60(4):103382
Keyphrase prediction aims to generate phrases (keyphrases) that highly summarizes a given document. Recently, researchers have conducted in-depth studies on this task from various perspectives. In this paper, we comprehensively summarize representative studies from the perspectives of dominant models, datasets and evaluation metrics. Our work analyzes up to 167 previous works, achieving greater coverage of this task than previous surveys. Particularly, we focus highly on deep learning-based keyphrase prediction, which attracts increasing attention of this task in recent years. Afterwards, we conduct several groups of experiments to carefully compare representative models. To the best of our knowledge, our work is the first attempt to compare these models using the identical commonly-used datasets and evaluation metric, facilitating in-depth analyses of their disadvantages and advantages. Finally, we discuss the possible research directions of this task in the future. 相似文献
3.
Automated keyphrase extraction is a fundamental textual information processing task concerned with the selection of representative phrases from a document that summarize its content. This work presents a novel unsupervised method for keyphrase extraction, whose main innovation is the use of local word embeddings (in particular GloVe vectors), i.e., embeddings trained from the single document under consideration. We argue that such local representation of words and keyphrases are able to accurately capture their semantics in the context of the document they are part of, and therefore can help in improving keyphrase extraction quality. Empirical results offer evidence that indeed local representations lead to better keyphrase extraction results compared to both embeddings trained on very large third corpora or larger corpora consisting of several documents of the same scientific field and to other state-of-the-art unsupervised keyphrase extraction methods. 相似文献
4.
5.
Many existing systems for analyzing and summarizing customer reviews about products or service are based on a number of prominent review aspects. Conventionally, the prominent review aspects of a product type are determined manually. This costly approach cannot scale to large and cross-domain services such as Amazon.com, Taobao.com or Yelp.com where there are a large number of product types and new products emerge almost everyday. In this paper, we propose a novel method empowered by knowledge sources such as Probase and WordNet, for extracting the most prominent aspects of a given product type from textual reviews. The proposed method, ExtRA (Extraction of Prominent Review Aspects), (i) extracts the aspect candidates from text reviews based on a data-driven approach, (ii) builds an aspect graph utilizing the Probase to narrow the aspect space, (iii) separates the space into reasonable aspect clusters by employing a set ofproposed algorithms and finally (iv) generates K most prominent aspect terms or phrases which do not overlap semantically automatically without supervision from those aspect clusters. ExtRA extracts high-quality prominent aspects as well as aspect clusters with little semantic overlap by exploring knowledge sources. ExtRA can extract not only words but also phrases as prominent aspects. Furthermore, it is general-purpose and can be applied to almost any type of product and service. Extensive experiments show that ExtRA is effective and achieves the state-of-the-art performance on a dataset consisting of different product types. 相似文献
6.
在文本自动分类中,目前有词频和文档频率统计这两种概率估算方法,采用的估算方法恰当与否会直接影响特征抽取的质量与分类的准确度。本文采用K最近邻算法实现中文文本分类器,在中文平衡与非平衡两种训练语料下进行了训练与分类实验,实验数据表明使用非平衡语料语料时,可以采用基于词频的概率估算方法,使用平衡语料语料时,采用基于文档频率的概率估算方法,能够有效地提取高质量的文本特征,从而提高分类的准确度。 相似文献
7.
[目的/意义]在自动摘要技术的基础上,结合专利特性,提出一种专利技术功效特征的自动抽取方法.[方法/过程]抽取对象包括核心技术内容、功能效用描述两部分;根据专利的文本结构特性设计抽取方案;对所抽取到的技术内容语句进行核心性计算和评价,对所抽取到的功能效用语句进行情感分析,凝练和筛选后得到专利技术功效特征.[结果/结论]... 相似文献
8.
提出了一种松弛变异遗传算法的老年人体摔倒行为挖掘算法。利用主成份分析方法对采集的随机图片进行降维处理,该机制通过固定较大或较小的基因位来体现摔倒行为特征的寻优性能,每次基因位的固定都能改变遗传算法的寻优空间,在新的空间中存在旧空间中不能被观察到的摔倒行为特征信息。实验结果表明,利用改进后算法能够有效提高老年人体摔倒行为挖掘的准确性,从而有效对人体摔倒行为进行识别,丰富系统功能。 相似文献
9.
一种基于TFIDF方法的中文关键词抽取算法 总被引:3,自引:1,他引:3
本文在海量智能分词基础之上,提出了一种基于向量空间模型和TFIDF方法的中文关键词抽取算法.该算法在对文本进行自动分词后,用TFIDF方法对文献空间中的每个词进行权重计算,然后根据计算结果抽取出科技文献的关键词.通过自编软件进行的实验测试表明该算法对中文科技文献的关键词自动抽取成效显著. 相似文献
10.
Extracting semantic relationships between entities from text documents is challenging in information extraction and important for deep information processing and management. This paper proposes to use the convolution kernel over parse trees together with support vector machines to model syntactic structured information for relation extraction. Compared with linear kernels, tree kernels can effectively explore implicitly huge syntactic structured features embedded in a parse tree. Our study reveals that the syntactic structured features embedded in a parse tree are very effective in relation extraction and can be well captured by the convolution tree kernel. Evaluation on the ACE benchmark corpora shows that using the convolution tree kernel only can achieve comparable performance with previous best-reported feature-based methods. It also shows that our method significantly outperforms previous two dependency tree kernels for relation extraction. Moreover, this paper proposes a composite kernel for relation extraction by combining the convolution tree kernel with a simple linear kernel. Our study reveals that the composite kernel can effectively capture both flat and structured features without extensive feature engineering, and easily scale to include more features. Evaluation on the ACE benchmark corpora shows that the composite kernel outperforms previous best-reported methods in relation extraction. 相似文献
11.
基于模糊理论,提出以百分比填答的模糊语义表,采用模糊语义算法,构建科研项目立项评估模型。构建过程先由回收的电子打分表,取模糊值大于模糊中位数的有效值,并提取出评估科研项目的主指标、次指标,建立科研项目评估表,供评审委员会主任、副主任、专家总体组长进行评估。由评审专家按照科研项目评估表给予各准则的权重,并根据此准则进行科研项目评分,采用模糊集方式表达,将各抽象的准则转为量化的三角模糊数,对模糊数进行排序评估。最后进行案例评估分析,对构建的模型进行验证,结果表明具有良好的优越性,为科学、客观、公正的评估和优选科研项目立项提供了一种更加可行的方法。 相似文献
12.
基于词链的自动分词方法 总被引:3,自引:1,他引:3
An algorithm for automatic segmentation of Chinese word,which is an improved version of the minimum matching algorithm,is put forward.The key idea of the algorithm is to optimize the word bank and the matching process to enhance the speed and accuracy of word segmentation.By integrating the case bank for processing ambiguous word chain with relevant segmentation rules,the correctness of word segmentation is enhanced,which partly makes up the deficiency in processing natural language. 相似文献
13.
[目的/意义]旨在将科技文献的价值进行量化,提高PageRank算法应用在科技文献排名中的准确性。[方法/过程]在加入时间因子的PageRank算法的改进算法WPageRank的基础上,加入引用相关度进行改进,并计算文献的固有价值,与文献的PageRank值进行加权求和,得到文献的最终价值。[结果/结论]本文提出的方法使新发表的高质量文献也可以获得较高排名,并且使领域内的高质量文献更容易被检索到,同时保证了检索的时效性和主题集中性。 相似文献
14.
基于改进特征提取及聚类的网络评论挖掘研究 总被引:1,自引:0,他引:1
[目的/意义]针对信息过载条件下中文网络产品评论中特征提取性能低以及特征聚类中初始中心点的选取问题。[方法/过程]本研究提出采用基于权重的改进Apriori算法产生候选产品特征集合,再根据独立支持度、频繁项名词非特征规则及基于网络搜索引擎的PMI算法对候选产品特征集合进行过滤。并以基于HowNet的语义相似度和特征观点共现作为衡量产品特征之间关联程度的特征,提出一种改进K-means聚类算法对产品特征进行聚类。[结果/结论]实验结果表明,在特征提取阶段,查准率为69%,查全率为92.64%,综合值达到79.07%。在特征聚类阶段,本文提出的改进K-means算法相对传统算法具有更优的挖掘性能。 相似文献
15.
基于遗传算法的个性化信息的特征提取 总被引:2,自引:0,他引:2
随着万维网(WWW)中信息量呈指数增长,人们可以使用许多信息收集工具来获得网络中的信息。但要使检索到的信息在满足用户个性化需求方面,既具有高准确率又有高回收率,则是一件很困难的事情。为了解决以上问题。该文首先介绍特征提取的概念,并在此基础上提出了一个基于遗传算法的web文本特征抽取算法,该算法进一步提高了web文本的处理效率。 相似文献
16.
This paper proposes a novel hierarchical learning strategy to deal with the data sparseness problem in semantic relation extraction by modeling the commonality among related classes. For each class in the hierarchy either manually predefined or automatically clustered, a discriminative function is determined in a top-down way. As the upper-level class normally has much more positive training examples than the lower-level class, the corresponding discriminative function can be determined more reliably and guide the discriminative function learning in the lower-level one more effectively, which otherwise might suffer from limited training data. In this paper, two classifier learning approaches, i.e. the simple perceptron algorithm and the state-of-the-art Support Vector Machines, are applied using the hierarchical learning strategy. Moreover, several kinds of class hierarchies either manually predefined or automatically clustered are explored and compared. Evaluation on the ACE RDC 2003 and 2004 corpora shows that the hierarchical learning strategy much improves the performance on least- and medium-frequent relations. 相似文献
17.
本文以综述的形式对跨语言文本分类技术目前的发展态势进行了介绍,从应用背景出发,了解跨语言文本分类技术的社会需求;从关键技术出发,了解该项技术的核心问题及解决方案;从已有研究成果得到的结论揭示了该项技术的发展状况,作为一种重要的多语信息组织手段,跨语言文本分类技术发展前景广阔。Abstract: The present development situation of Cross-Language Text Categorization (CLTC) technologies is summarized.The paper describes the social demand for CLTC technologies from the perspective of the application background,describes the core issues of and solutions to CLTC technologies from the perspective of key technologies,and discloses the development status of CLTC technologies from the conclusions drawn from the obtained research results.As an important means for multilingual information organization,CLTC technologies have a broad development prospect. 相似文献
18.
传统的大数据中价值信息提取方法采用基于模糊学习理论的数据融合处理方法,将预定学习序列输入神经网络,通过模糊启发,对预定序列进行多模型映射,此方法模型复杂,且启发率低。提出一种大数据子集特征遗忘启发的价值信息提取方法,对大数据进行非线性映射归一化,使每个子集实现并行运算,通过混沌方法提取子集特征,并建立混沌模型下的子集特征遗忘启发链,针对不同子集中的价值信息,依据遗忘启发链实现启发,提取价值信息。采用一组大数据下的伪随机价值信息进行提取测试,仿真实验表明,本文价值信息提取方法的提取率达到了98%,对于大数据下的价值信息提取具有很好的指导意义。 相似文献
19.
当前主流的人脸识别算法,都是把原有的彩色图像转化为灰度图后,采用基于灰度图像的特征抽取与识别算法进行分类识别。人们在实际操作过程中,只是使用一组简单的加权系数实现从彩色图像到灰度图的转换,这并不能很好的体现R,G,B 3个颜色分量之间的次重关系。本文根据人脸图像颜色组成的特点,对彩色人脸图像的R,G,B 3个分量的颜色信息进行特征抽取与分析,从中找出鉴别特征的三基色系数表示方法,把彩色图像转化为灰度图。最后,在国际通用的AR标准彩色人脸库中进行了大量实验,验证了本文算法的有效性。 相似文献
20.
Within the context of Information Extraction (IE), relation extraction is oriented towards identifying a variety of relation phrases and their arguments in arbitrary sentences. In this paper, we present a clause-based framework for information extraction in textual documents. Our framework focuses on two important challenges in information extraction: 1) Open Information Extraction and (OIE), and 2) Relation Extraction (RE). In the plethora of research that focus on the use of syntactic and dependency parsing for the purposes of detecting relations, there has been increasing evidence of incoherent and uninformative extractions. The extracted relations may even be erroneous at times and fail to provide a meaningful interpretation. In our work, we use the English clause structure and clause types in an effort to generate propositions that can be deemed as extractable relations. Moreover, we propose refinements to the grammatical structure of syntactic and dependency parsing that help reduce the number of incoherent and uninformative extractions from clauses. In our experiments both in the open information extraction and relation extraction domains, we carefully evaluate our system on various benchmark datasets and compare the performance of our work against existing state-of-the-art information extraction systems. Our work shows improved performance compared to the state-of-the-art techniques. 相似文献