首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
信息检索新技术   总被引:1,自引:0,他引:1  
信息检索通常指文本信息检 索,包括信息的存储、组织、表现、 查询、存取等各个方面,其核心为 文本信息的索引和检索。从信息 检索的发展过程来看,信息检索 经历了手工检索、计算机检索到 目前网络化、智能化检索等多个 发展阶段。目前,信息检索已经发 展到网络化和智能化的阶段。信 息检索的对象从相对封闭、稳定 一致、由独立数据库集中管理的 信息内容扩展到开放、动态、更新 快、分布广、管理松散的Web内 容。信息检索为了适应网络化、智 能化以及个性化的需要,一些新 技术应运而生。这些技术主要包 括并行检索、分布式检索、基于知 识的智能检索、异构平台检索、自 然语言检索、全息检索、概念空间 和信息融合技术。  相似文献   

2.
在文本分析与信息检索领域,方法上一个主要的问题就是,如果分析并构建文本的语义表示,提升文本分类及检索的性能;应用上关注的主要就是垂直领域的信息检索系统,诸如网络环境下的图书检索与推荐,以及生物医学文献检索与问答等。这里,社会图书搜索与推荐是指,利用搜索引擎和信息推荐技术,对社交媒体和互联网环境中的海量图书进行有效的分析和检索,并针对用户的语义查询和图书的社会信息,给出精确的推荐与建议。生物医学文献检索与问答是指针对生物医学领域专家标定的自然语言描述问题,利用信息检索与自然语言处理技术,对海量生物医学文献库进行检索和定位,找到可以与提出问题相关联的文献以及相关的句子,为生成准确的答案提供理论基础。  相似文献   

3.
熊文新 《图书情报工作》2012,56(17):115-121
考察在信息检索过程中用户以自然语言表述的查询语句中的词语使用情况。以一个信息需求描述颗粒度不等的查询表述语料库为素材,辅以汉语通用语料作为对照,通过词频以及词语的文本覆盖率等统计数据,按照是否需要在目标文本中直接或以其他形式出现,将查询表述语句中的词语区分为对汉语文本处理具有普遍意义的通用停用词、服务于信息检索表述用的专用停用词和与特定需求相关的信息内容词语。区分词语使用的不同性质,能为信息系统前端的自然语言查询处理增加一道剥离工序,防止将整个查询语句的分词结果全部作为检索项所造成的效率和准确率的退化。  相似文献   

4.
网络信息检索的未来   总被引:8,自引:0,他引:8  
网络信息检索在未来的发展表现在以下几个方面:网络检索工具的综合化与专业化;网络检索工具的智能化;检索语言的两极化;对非文本信息检索能力的提高;人工参与检索工具的信息组织;收费网络信息检索工具的兴起.  相似文献   

5.
基于知识模型的文本信息检索可视化研究   总被引:5,自引:0,他引:5  
信息检索可视化是指把文献信息、用户提问、各种情报检索模型以及利用检索模型进行信息检索过程中不可见的内部语义关系转换成图形,在一个二维或三维的可视化空间中显示出来,并向用户提供信息检索的技术。基于知识模型的文本信息检索可视化,是利用信息资源的元数据信息来进行可视化检索。图1。参考文献29。  相似文献   

6.
在数字图书馆中进行信息检索是一件繁琐和乏味的工作,由于无法识别用户的检索个性化,导致信息检索的结果不尽如人意。我们在数字图书馆的信息检索系统中,将系统集中在查询个性化;特别地,我们处理结构化的检索使存储的原数据加入相关的数据库中,通过对用户描述文件里的用户偏好的分析,描述了查询重写规则在构建个性化检索中的作用。  相似文献   

7.
付雅慧 《兰台内外》2020,(10):51-53
数字化信息服务作为图书馆服务的重要组成部分,优化信息检索技术、提高公共图书馆数字化服务水平是图书馆领域一直探讨的问题。网格信息检索技术是利用网格技术具有的强大计算机优势和资源共享优势,为信息检索提供查询调度以及资源管理的服务。网格技术下主题爬虫依照目标主题进行相关的信息搜集、智能处理和分析、满足用户检索需求。其对于信息收集的精准化、信息处理智能化、信息检索高效化、信息知识共享化等特征,在提高图书馆数字化服务中对于信息获取的查全率、查准率、专业性、以及查询速度等方面发挥优化作用。  相似文献   

8.
针对传统检索方法在当今网络信息环境下所面临的问题,提出基于领域本体的专业文献信息检索模型,就信息组织、查询处理和语义检索过程进行研究,并开发一个基于领域本体的专业文献信息检索原型系统.比较测试表明,基于领域本体的专业文献信息检索不仅具有实现可行性,而且在检索效果上优于传统的检索模式,具有一定的应用前景.  相似文献   

9.
一、档案信息检索中的本体方法 目前的档案信息检索主要借助于关键词或分类标识。由于关键词使用自然语言,不能反映语义之间的关系.也不能与相关概念建立准确的对应关系,其查询结果可能产生大量毫无相关的信息,又可能丢失重要的信息。当我们在互联网搜索引擎上用关键词检索档案信息时就会发现,真正需要的信息被淹没在大量的无关信息之中,  相似文献   

10.
基于模糊语义距离的多媒体信息检索方法研究   总被引:4,自引:1,他引:3  
张李义 《情报学报》2003,22(2):131-135
与传统的数据库精确查询不同 ,多媒体信息检索的查询条件是不完备的。本文叙述利用模糊语义距离来检索多媒体数据库中信息的原理、算法 ,并将模糊相似测试作为检索结果判断标准 ,最后通过一个示例来说明本方法的使用。  相似文献   

11.
An information retrieval (IR) system can often fail to retrieve relevant documents due to the incomplete specification of information need in the user’s query. Pseudo-relevance feedback (PRF) aims to improve IR effectiveness by exploiting potentially relevant aspects of the information need present in the documents retrieved in an initial search. Standard PRF approaches utilize the information contained in these top ranked documents from the initial search with the assumption that documents as a whole are relevant to the information need. However, in practice, documents are often multi-topical where only a portion of the documents may be relevant to the query. In this situation, exploitation of the topical composition of the top ranked documents, estimated with statistical topic modeling based approaches, can potentially be a useful cue to improve PRF effectiveness. The key idea behind our PRF method is to use the term-topic and the document-topic distributions obtained from topic modeling over the set of top ranked documents to re-rank the initially retrieved documents. The objective is to improve the ranks of documents that are primarily composed of the relevant topics expressed in the information need of the query. Our RF model can further be improved by making use of non-parametric topic modeling, where the number of topics can grow according to the document contents, thus giving the RF model the capability to adjust the number of topics based on the content of the top ranked documents. We empirically validate our topic model based RF approach on two document collections of diverse length and topical composition characteristics: (1) ad-hoc retrieval using the TREC 6-8 and the TREC Robust ’04 dataset, and (2) tweet retrieval using the TREC Microblog ’11 dataset. Results indicate that our proposed approach increases MAP by up to 9% in comparison to the results obtained with an LDA based language model (for initial retrieval) coupled with the relevance model (for feedback). Moreover, the non-parametric version of our proposed approach is shown to be more effective than its parametric counterpart due to its advantage of adapting the number of topics, improving results by up to 5.6% of MAP compared to the parametric version.  相似文献   

12.
As the volume and variety of information sources continues to grow, there is increasing difficulty with respect to obtaining information that accurately matches user information needs. A number of factors affect information retrieval effectiveness (the accuracy of matching user information needs against the retrieved information). First, users often do not present search queries in the form that optimally represents their information need. Second, the measure of a document’s relevance is often highly subjective between different users. Third, information sources might contain heterogeneous documents, in multiple formats and the representation of documents is not unified. This paper discusses an approach for improvement of information retrieval effectiveness from document databases. It is proposed that retrieval effectiveness can be improved by applying computational intelligence techniques for modelling information needs, through interactive reinforcement learning. The method combines qualitative (subjective) user relevance feedback with quantitative (algorithmic) measures of the relevance of retrieved documents. An information retrieval is developed whose retrieval effectiveness is evaluated using traditional precision and recall.  相似文献   

13.
In this paper we look at some of the problems in interacting with best-match retrieval systems. In particular, we examine the areas of interaction, some investigations of the complexity and breadth of interaction and attempts to categorise user's information seeking behaviour. We suggest that one of the difficulties of traditional IR systems in supporting information seeking is the way the information content of documents is represented. We discuss an alternative representation, based on how information is used within documents.  相似文献   

14.
Genetic Approach to Query Space Exploration   总被引:2,自引:0,他引:2  
This paper describes a genetic algorithm approach for intelligent information retrieval. The goal is to find an optimal set of documents which best matches the user's needs by exploring and exploiting the document space. More precisely, we define a specific genetic algorithm for information retrieval based on knowledge based operators and guided by a heuristic for relevance multi-modality problem solving. Experiments with TREC-6 French data and queries show the effectiveness of our approach.  相似文献   

15.
Efficient information searching and retrieval methods are needed to navigate the ever increasing volumes of digital information. Traditional lexical information retrieval methods can be inefficient and often return inaccurate results. To overcome problems such as polysemy and synonymy, concept-based retrieval methods have been developed. One such method is Latent Semantic Indexing (LSI), a vector-space model, which uses the singular value decomposition (SVD) of a term-by-document matrix to represent terms and documents in k-dimensional space. As with other vector-space models, LSI is an attempt to exploit the underlying semantic structure of word usage in documents. During the query matching phase of LSI, a user's query is first projected into the term-document space, and then compared to all terms and documents represented in the vector space. Using some similarity measure, the nearest (most relevant) terms and documents are identified and returned to the user. The current LSI query matching method requires that the similarity measure be computed between the query and every term and document in the vector space. In this paper, the kd-tree searching algorithm is used within a recent LSI implementation to reduce the time and computational complexity of query matching. The kd-tree data structure stores the term and document vectors in such a way that only those terms and documents that are most likely to qualify as nearest neighbors to the query will be examined and retrieved.  相似文献   

16.
Automatic detection of source code plagiarism is an important research field for both the commercial software industry and within the research community. Existing methods of plagiarism detection primarily involve exhaustive pairwise document comparison, which does not scale well for large software collections. To achieve scalability, we approach the problem from an information retrieval (IR) perspective. We retrieve a ranked list of candidate documents in response to a pseudo-query representation constructed from each source code document in the collection. The challenge in source code document retrieval is that the standard bag-of-words (BoW) representation model for such documents is likely to result in many false positives being retrieved, because of the use of identical programming language specific constructs and keywords. To address this problem, we make use of an abstract syntax tree (AST) representation of the source code documents. While the IR approach is efficient, it is essentially unsupervised in nature. To further improve its effectiveness, we apply a supervised classifier (pre-trained with features extracted from sample plagiarized source code pairs) on the top ranked retrieved documents. We report experiments on the SOCO-2014 dataset comprising 12K Java source files with almost 1M lines of code. Our experiments confirm that the AST based approach produces significantly better retrieval effectiveness than a standard BoW representation, i.e., the AST based approach is able to identify a higher number of plagiarized source code documents at top ranks in response to a query source code document. The supervised classifier, trained on features extracted from sample plagiarized source code pairs, is shown to effectively filter and thus further improve the ranked list of retrieved candidate plagiarized documents.  相似文献   

17.
一种面向语义的信息检索方法   总被引:1,自引:0,他引:1  
传统的信息检索技术忽视了语义对检索过程的影响,这是造成查准率不高的一个重要原因.论文提出了一种面向语义的信息检索方法,该方法强调使用基于知网的语义处理技术实现对用户查询需求和目标文档的语义标注,使用基于知网的词汇链技术实现对文档特征词汇的过滤.一方面可以实现语义级别的检索匹配,另一方面可以降低大量无关词对检索结果的干扰.论文描述了一个实现该方法的信息检索系统SOIRS,并且利用该系统与传统检索系统做了对比实验.实验结果表明面向语义的信息检索方法在查准率方面要明显优于传统信息检索方法.  相似文献   

18.
隐含语义检索及其应用   总被引:5,自引:1,他引:4  
隐含语义检索(Latent Semantic Indexing, LSI) 是一种基于概念的文献检索方式。它区别于传统的基于用户查询条件与文档的单词匹配的文献检索方法, 根据文档与查询条件在语义上的关联而向用户提交查询结果。本文介绍了隐含语义检索在文献检索中的一种实现方法, 为文献检索提供了一种新的途径。  相似文献   

19.
Information Retrieval from Documents: A Survey   总被引:4,自引:0,他引:4  
Given the phenomenal growth in the variety and quantity of data available to users through electronic media, there is a great demand for efficient and effective ways to organize and search through all this information. Besides speech, our principal means of communication is through visual media, and in particular, through documents. In this paper, we provide an update on Doermann's comprehensive survey (1998) of research results in the broad area of document-based information retrieval. The scope of this survey is also somewhat broader, and there is a greater emphasis on relating document image analysis methods to conventional IR methods.Documents are available in a wide variety of formats. Technical papers are often available as ASCII files of clean, correct, text. Other documents may only be available as hardcopies. These documents have to be scanned and stored as images so that they may be processed by a computer. The textual content of these documents may also be extracted and recognized using OCR methods. Our survey covers the broad spectrum of methods that are required to handle different formats like text and images. The core of the paper focuses on methods that manipulate document images directly, and perform various information processing tasks such as retrieval, categorization, and summarization, without attempting to completely recognize the textual content of the document. We start, however, with a brief overview of traditional IR techniques that operate on clean text. We also discuss research dealing with text that is generated by running OCR on document images. Finally, we also briefly touch on the related problem of content-based image retrieval.  相似文献   

20.
A basic notion of probability theory is the event space, on which the probability measure is defined. A probabilistic model needs an event space. However, some classes of events (which we may want to model probabilistically) exhibit structure which does not fit well into the traditional event space notion. A simple one-to-many example is discussed at length. The information retrieval case, involving queries, documents and relevance, is analysed. The event space issue makes for some difficulty in comparing different probabilistic models in IR.Revised version of a paper presented at the MF/IR Workshop, SIGIR 2002, Tampere, Finland, under the title On Bayesian models and event spaces in information retrieval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号