首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
赵洪 《情报学报》2020,(3):330-344
自动文摘是文本挖掘的主要任务之一。相比于抽取式自动文摘,生成式自动文摘在思想上更接近人工摘要的过程,具有重要研究意义。近几年伴随着深度学习方法的发展,基于深层神经网络模型的生成式自动文摘也有了令人瞩目的发展。为了更全面地理解该类方法的思想和研究现状,本文从生成式自动文摘的任务描述入手,梳理了基于RNN (recurrent neural network,循环神经网络)的模型、基于CNN (convolutional neural network,卷积神经网络)的模型、基于RNN+CNN的模型、融合注意力机制的模型和融合强化学习的模型共五大类生成式自动文摘的深度学习方法。这类方法表明,在深层神经网络的训练下,特别是融合注意力机制和强化学习后,摘要效果得以明显提升。在生成式自动文摘研究的未来发展中,除深度学习方法本身的不断应用和改进外,还需关注如何有效实现篇章级语义理解下的摘要、面向不同文本对象特点的摘要和摘要结果自动评价等问题。此外,如何结合传统摘要研究中的成熟方法进一步提高摘要效果,也是一个很有价值的研究方向。  相似文献   

2.
选取网络文本资源的标题识别作为切入点,除考虑多数研究关注的文本的格式信息(如字体)、位置信息等特征外,加入对标题与网页正文内容的相关度的考虑,利用科技监测项目采集到的大量历史数据作为统计分析的基础,从候选标题的可能来源和特征方面,构建基于规则的网络文本资源标题快速识别方法,并给出该方法的时间效率和识别准确率测评结果。  相似文献   

3.
A review of text and image retrieval approaches for broadcast news video   总被引:1,自引:0,他引:1  
The effectiveness of a video retrieval system largely depends on the choice of underlying text and image retrieval components. The unique properties of video collections (e.g., multiple sources, noisy features and temporal relations) suggest we examine the performance of these retrieval methods in such a multimodal environment, and identify the relative importance of the underlying retrieval components. In this paper, we review a variety of text/image retrieval approaches as well as their individual components in the context of broadcast news video. Numerous components of text/image retrieval have been discussed in detail, including retrieval models, text sources, temporal expansion methods, query expansion methods, image features, and similarity measures. For each component, we conduct a series of retrieval experiments on TRECVID video collections to identify their advantages and disadvantages. To provide a more complete coverage of video retrieval, we briefly discuss an emerging approach called concept-based video retrieval, and review strategies for combining multiple retrieval outputs.  相似文献   

4.
文章分析了图书馆信息资源共建共享的研究现状,提出了建立高校信息资源共建共享管理模式与管理机制,并对建立共享网络信息平台、信息检索系统、手机短信服务系统以及图书馆自动统计人数系统等几个方面进行了深入研究.参考文献8.  相似文献   

5.
A recent “third wave” of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, we survey the current landscape of Neural IR research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research.  相似文献   

6.
研究利用XML文本片段和图像的内容特征(颜色)实现图像的检索。基于XML多媒体数字图书馆检索系统平台WHU-XML,对XML文本和图像构建索引,并在此基础上,采用线性归并法,实现基于XML文本片段的图像检索和基于图像内容特征(颜色)检索的结合。研究结果表明,当文本检索权重大于图像内容检索的权重时,检索效果比只采用单一检索方式时好。  相似文献   

7.
[目的/意义]对比文件是用以判断专利能否授权或无效的重要文件,针对传统信息检索方法的不足且鲜有利用机器学习方法研究对比文件检索的问题,在引入对比文件信息的基础上,构建专利相关性判定模型.[方法/过程]以专利无效判决书中的目标专利与对比文件为数据集进行实验,提取文本相似度、共现词汇和共词数量特征信息,利用GBDT模型将对...  相似文献   

8.
《The Reference Librarian》2013,54(27-28):177-183
Classification of periodicals can be done for either of two reasons- to place bound periodical volumes in the stacks close to monographs on the same subject or to organize the volumes in a separate periodicals area. Either reason provides a key to locating periodicals that does not depend on the patron's ability to interpret spine titles, title changes, or changing cataloging codes. If the choice is to integrate bound periodical volumes with the monographs, the same classification system must be used. For a separate bound periodicals area, many libraries have developed schemes to organize their titles on the shelves in alphabetical order by entry. Which choice is best depends on where a library chooses to shelve its periodicals and how the library staff believes patrons approach the task of finding titles.  相似文献   

9.
[目的/意义]对文献数据库用户心智模型演进的驱动因素结构进行测量。[方法/过程]研究采用问卷调查法收集483份关于文献数据库用户对其心智模型演进驱动因素认知的问卷,采用二阶验证性因素分析方法对收集到的数据进行分析。[结果/结论]研究发现文献数据库用户心智模型的驱动因素有文献数据库界面引导与提示、自我摸索、与同学交流、文献数据库信息服务产品、搜索引擎学习迁移、简单信息检索任务、复杂信息检索任务、信息检索课程、请教老师、图书馆信息检索培训和购物网站学习迁移。这些因素对用户心智模型演进的重要性依次升高。此外,由于用户心智模型构成维度的复合性,每种驱动因素对文献数据库内容认知、信息检索方法认知、信息检索结果筛选的影响都存在差异。研究结果可为文献数据库的界面优化设计和用户信息素养培训提供指导建议。  相似文献   

10.
在Gnutella系统中,节点之间转发消息的方式是泛洪,这必然会导致网络拥塞。根据小世界理论,在基于非结构化的P2P网络中构建具有小世界特性的P2P网络,使得网络中的每个节点都维护一定数量的邻居节点作为短程连接,同时每个节点还要维护一些长程连接来提高文本检索效率和减少节点之间的通信开销。  相似文献   

11.
提出了汉字全文检索系统的新的数据结构、建库和检索的算法,完成了程序设计、用于对中国化学文献数据库标题和文摘的检索,测定了索引建立时间、空间消耗和检索的响应时间,计算了每篇文献的长度在不同范围时的高频字数和索引空间消耗,讨论了索引膨胀比与文献长度的关系  相似文献   

12.
陈悦  宋凯  刘安蓉  曹晓阳 《情报学报》2021,40(3):286-296
颠覆性技术是一个具有复杂的内在结构的技术群。从空间维度来看,颠覆性技术是包含了主导技术、辅助技术、支撑技术的复杂技术群,涉及多学科、多领域。在此背景下,运用科学计量的方法对颠覆性技术进行科技评价和科学技术演变规律探索面临挑战,实质表现为数据检索。本文探索了一种基于机器学习的专利数据集构建新策略,将专利检索任务作为机器学习的二分类任务,类似于信息检索中基于主动学习的查询分类思想,并提出了将F-measure特征最大化方法与CNN(convolutional neural networks)模型相结合的文本分类改进方法。本文以人工智能(artificial intelligence,AI)技术域为例进行训练实验,实验结果的准确率、召回率和F1值分别达到98.01%、97.04%和97.89%,这表明本文提出的策略能够精准地识别人工智能专利,提高了专利检索的准确率和召回率,以利于构建精、准、全的人工智能技术域专利数据集。  相似文献   

13.
Internet信息检索分析与研究   总被引:7,自引:0,他引:7  
综述了目前Internet 网上信息检索的主要方法及存在的问题, 并对其检索技术进行了深入的分析与比较。介绍了机器学习、智能A gent、信息过滤等新技术在信息检索中的应用, 并采用神经网络Hopfield 模型及算法进行词汇扩充来提高用户的检索提问表达, 从而提高了网上信息检索的能力。  相似文献   

14.
The ability to find tables and extract information from them is a necessary component of many information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in two-dimensional form. Their rich combination of formatting and content presents difficulties for traditional retrieval techniques. This paper describes techniques for extracting tables from text and retrieving answers from the extracted information. We compare machine learning (especially, Conditional Random Fields) and heuristic methods for table extraction. To retrieve answers, our approach creates a cell document, which contains the cell and its metadata (headers, titles) for each table cell, and the retrieval model ranks the cells of the extracted tables using a language-modeling approach. Performance is tested using government statistical Web sites and news articles, and errors are analyzed in order to improve the system.  相似文献   

15.
学术文献特征表示,是学术文献搜索、分类组织、个性化推荐等学术大数据服务的关键步骤。研究表明,图神经网络能够有效学习文献的特征表示,然而当前研究主要集中在有监督学习方法上,不仅对数据集的大小和质量的要求较高,且学习到的文献特征表示与具体任务高度耦合。基于此,本文将四种无监督图神经网络方法引入学术文献表示学习,从Cora、CiteSeer和DBLP (database systems and logic programming)数据集的引文网络、共被引网络和文献耦合网络中学习文献的表示向量,并应用于文献分类和论文推荐两大下游任务。研究结果表明,(1)深度互信息图神经网络适合于文献分类任务,对抗正则化变分图自编码器则在论文推荐任务上性能更佳;(2)Cora数据集上的结果表明,相较于共被引和文献耦合网络,引文网络更适合于学习通用的文献表示向量。  相似文献   

16.
分析了基于 Microsoft Search Service为图书馆自建数据库创建 Web全文检索系统的理由、Microsoft SearchService的索引机制和检索机制 ,并运用 ASP.NET技术给出了一个具体的实现方案  相似文献   

17.
In the Dewey Decimal Classification (DDC) Online Project, subject searching and browsing of DDC Schedules and Relative Index were featured in an experimental online catalog. The effectiveness of this DDC in an online catalog was tested in online retrieval experiments at four participating libraries. These experiments provided data for analyses of subject searchers' use of a library classification in the information retrieval environment of an online catalog. Recommendations were provided for the enhancement of bibliographic records, online catalogs, and online cataloging systems with a library classification. In this paper, subject searchers' use of the subject outline search capability of the experimental online catalog is described. This capability was unique to the experimental online catalog and all other online catalogs, because it referred searchers to online displays of the classification schedules based on their entry of subject terms. Failure analyses of subject outline searches demonstrated its specific strengths and weaknesses. Users' postsearch interview comments highlighted their experiences and their satisfaction with this search. Based on the failure analyses and users' interview comments, recommendations are provided for the improvement of the subject outline search in online catalogs.  相似文献   

18.
Probabilistic topic models have recently attracted much attention because of their successful applications in many text mining tasks such as retrieval, summarization, categorization, and clustering. Although many existing studies have reported promising performance of these topic models, none of the work has systematically investigated the task performance of topic models; as a result, some critical questions that may affect the performance of all applications of topic models are mostly unanswered, particularly how to choose between competing models, how multiple local maxima affect task performance, and how to set parameters in topic models. In this paper, we address these questions by conducting a systematic investigation of two representative probabilistic topic models, probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA), using three representative text mining tasks, including document clustering, text categorization, and ad-hoc retrieval. The analysis of our experimental results provides deeper understanding of topic models and many useful insights about how to optimize the performance of topic models for these typical tasks. The task-based evaluation framework is generalizable to other topic models in the family of either PLSA or LDA.  相似文献   

19.
The Health Science Library at University of Tennessee (UT), Memphis has taken advantage of a campuswide network for the purpose of providing enhanced access to library services. With a terminal or microcomputer, members of the UT Memphis community can use an electronic menu system to complete photocopy, interlibrary loan, and computer literature search request forms; leave messages or sign up for library workshops; use electronic mail to receive citations and abstracts from computer literature searches; use an electronic bulletin board to scan the library's new acquisitions lists, library hours, services, and policies; and use bibliographic retrieval software to search the library's locally mounted databases. Remote access to library services and electronic resources, which is available twenty-four hours a day, could potentially save users time and the institution money. Remote access, however, is intended to supplement, not to supplant or discourage, in-house library use.  相似文献   

20.
In multicomputer networks, the adaptive routing has been expected as a promising way to improve network performance by utilizing available network bandwidth. Previous adaptive routing algorithms in wormhole-routed multicomputer networks restrict the routing of messages to prevent deadlocks, and the routing restriction results in low degree of adaptiveness and low utilization of communication channels. In this paper, we examine the possibility of performing restriction-free, nonminimal adaptive routing in wormhole-routed networks as an approach to further improving the performance of these networks. A new flow control policy, called message cutting, is proposed, and two adaptive routing strategies are presented. Freedom of communication deadlock is achieved by the proposed flow control policy. The proposed adaptive routing strategies do not restrict routing and maximally utilize the physical and virtual channels. Simulation results show that the restriction-free adaptive routing approach is promising from the fact that it has the lowest latency and highest throughput depending on the number of virtual channels per physical channel and patterns of message traffic.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号