首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于潜语义标引的自然语言检索   总被引:3,自引:0,他引:3  
在信息检索中, 向量空间模型是最有效的数学工具之一。由于自然语言检索的特殊性, 以及传统信息检索模型受到同义词、多义词的影响, 检索的查准率不高。为了提高自然语言检索的查准率, 我们对基于概念的信息检索模型——
潜语义标引(LS I) 模型进行了探讨, 并分析了基于LS I 的两个实例。  相似文献   

2.
Information Retrieval from Documents: A Survey   总被引:4,自引:0,他引:4  
Given the phenomenal growth in the variety and quantity of data available to users through electronic media, there is a great demand for efficient and effective ways to organize and search through all this information. Besides speech, our principal means of communication is through visual media, and in particular, through documents. In this paper, we provide an update on Doermann's comprehensive survey (1998) of research results in the broad area of document-based information retrieval. The scope of this survey is also somewhat broader, and there is a greater emphasis on relating document image analysis methods to conventional IR methods.Documents are available in a wide variety of formats. Technical papers are often available as ASCII files of clean, correct, text. Other documents may only be available as hardcopies. These documents have to be scanned and stored as images so that they may be processed by a computer. The textual content of these documents may also be extracted and recognized using OCR methods. Our survey covers the broad spectrum of methods that are required to handle different formats like text and images. The core of the paper focuses on methods that manipulate document images directly, and perform various information processing tasks such as retrieval, categorization, and summarization, without attempting to completely recognize the textual content of the document. We start, however, with a brief overview of traditional IR techniques that operate on clean text. We also discuss research dealing with text that is generated by running OCR on document images. Finally, we also briefly touch on the related problem of content-based image retrieval.  相似文献   

3.
In this paper the problem of indexing heterogeneous structured documents and of retrieving semi-structured documents is considered. We propose a flexible paradigm for both indexing such documents and formulating user queries specifying soft constraints on both documents structure and content. At the indexing level we propose a model that achieves flexibility by constructing personalised document representations based on users views of the documents. This is obtained by allowing users to specify their preferences on the documents sections that they estimate to bear the most interesting information, as well as to linguistically quantify the number of sections which determine the global potential interest of the documents. At the query language level, a flexible query language for expressing soft selection conditions on both the documents structure and content is proposed.  相似文献   

4.
潜在语义标引(LSI)的提出,使信息检索由传统的基于关键词的检索开始进入基于概念的语义检索阶段,有效提高了信息检索系统的性能。本文在回顾我国潜在语义标引技术的研究成果基础上,分析、总结了我国现有潜在语义标引研究的不足,指出了我国潜在语义标引的进一步研究方向。  相似文献   

5.
多媒体检索     
分析多媒体的种类和特点,探讨基于形式和内容的多媒体检索原理和方法,介绍多媒体检索的应用领域,展望其技术前景。  相似文献   

6.
基于混合索引的中文全文检索系统研究   总被引:1,自引:0,他引:1  
在中文全文检索系统中引入了混合索引,建立了混合索引之Hash索引,给出了Hash索引在内存中的存储结构,并给出了这种索引下的检索算法。这种索引既能保证索引的全面性,又能提高系统检索效率。通过实际构建系统,探讨了基于混合索引的中文全文检索系统的实现。  相似文献   

7.
李毅  庞景安 《情报学报》2003,22(4):403-411
为了提高中文医学信息检索效率,本文应用语义学研究成果,深入剖析统一医学语言系统(UMLS),从理论上对多层次概念语义网络结构进行了探讨,以此设计了适用于中文医学信息特点的三层概念语义网络结构,并分别确定了各个概念语义网络层次的语义类型和语义关系,进一步完善了医学信息语义网络.以信息检索的认知理论为依据,建立了基于三层概念语义网络结构的中文医学信息语义标引体系和语义检索模型.对扩展检索和语义检索进行统计学Kappa检验,认为两种检索方法的一致性非常显著(p<0.01);与扩展检索中的任何一种方法相比,语义检索方法具有更高的检索效率.  相似文献   

8.
Summarizing Similarities and Differences Among Related Documents   总被引:10,自引:0,他引:10  
In many modern information retrieval applications, a common problem which arises is the existence of multiple documents covering similar information, as in the case of multiple news stories about an event or a sequence of events. A particular challenge for text summarization is to be able to summarize the similarities and differences in information content among these documents. The approach described here exploits the results of recent progress in information extraction to represent salient units of text and their relationships. By exploiting meaningful relations between units based on an analysis of text cohesion and the context in which the comparison is desired, the summarizer can pinpoint similarities and differences, and align text segments. In evaluation experiments, these techniques for exploiting cohesion relations result in summaries which (i) help users more quickly complete a retrieval task (ii) result in improved alignment accuracy over baselines, and (iii) improve identification of topic-relevant similarities and differences.  相似文献   

9.
网络环境下信息存储与检索技术的发展   总被引:7,自引:0,他引:7  
信息存储与检索技术是信息传递中的重要环节。检索语言和检索效率密切相关,它在信息检索过程中起着语言保障的作用。为满足不同用户能够检索到所需要的信息,检索语言必然朝着自然语言、用户界面友好的方向发展。  相似文献   

10.
研究利用XML文本片段和图像的内容特征(颜色)实现图像的检索。基于XML多媒体数字图书馆检索系统平台WHU-XML,对XML文本和图像构建索引,并在此基础上,采用线性归并法,实现基于XML文本片段的图像检索和基于图像内容特征(颜色)检索的结合。研究结果表明,当文本检索权重大于图像内容检索的权重时,检索效果比只采用单一检索方式时好。  相似文献   

11.
曹梅 《图书情报工作》2012,56(9):120-119
为了解中文网络检索情境下图像检索需求表达方面的行为规律,设计用户图像搜索实验来采集网络图像检索过程中的提问式进行小规模实证研究,一方面获得图像检索提问式的构造和语言语法方面的一般特征;另一方面通过对高效图像检索过程中提问式的专门分析,揭示高效图像提问式的个性特征。最后结合研究结果讨论提出图像检索需求表达规律和图像检索策略。  相似文献   

12.
社会标注在网络中的应用越来越广泛,它为信息资源的标引、组织、检索提供了一种全新的模式。国外学者对社会标注的信息标引功能和标引方式、社会标注系统在信息检索中的功用及基于社会标注的信息检索技术等方面进行了研究,取得了一系列成果,但还存在不足之处。该领域的研究趋势在于规范化社会标注的表达,去除标签噪声及垃圾,使标签序化、层级化等。  相似文献   

13.
提出一种结合语义检索和多属性决策方法的商品信息检索模型。通过构建语义向量空间进行语义相似度计算,以实现检索结果与顾客查询关键词的语义匹配;同时该模型也采用TOPSIS多属性决策方法对检索到的商品进行效用值计算,从而建立商品内容的比较机制。最后,从准确率、顾客接受度等指标通过实验证实该模型的有效性,能够提高商品信息检索的精准度。  相似文献   

14.
调研UMLS构成和建设特点,重点研究UMLS在检索方面的应用实例,分析归纳UMLS在语义化、智能化检索方面的功能设计、实现方法与实际效果,以期为基于集成式知识组织系统的智能检索应用的场景功能设计、技术开发和实现,提供借鉴和参考。UMLS在智能检索中的应用主要包括:(1)扩展检索,主要有同义词扩展、等级结构扩展和词组切分扩展等方法;(2)语义检索,基于概念和概念之间的关系进行检索和结果内容表达;(3)问答式检索,包括问题分析、文献检索、语句提取、答案生成和语义聚类。  相似文献   

15.
Efficient information searching and retrieval methods are needed to navigate the ever increasing volumes of digital information. Traditional lexical information retrieval methods can be inefficient and often return inaccurate results. To overcome problems such as polysemy and synonymy, concept-based retrieval methods have been developed. One such method is Latent Semantic Indexing (LSI), a vector-space model, which uses the singular value decomposition (SVD) of a term-by-document matrix to represent terms and documents in k-dimensional space. As with other vector-space models, LSI is an attempt to exploit the underlying semantic structure of word usage in documents. During the query matching phase of LSI, a user's query is first projected into the term-document space, and then compared to all terms and documents represented in the vector space. Using some similarity measure, the nearest (most relevant) terms and documents are identified and returned to the user. The current LSI query matching method requires that the similarity measure be computed between the query and every term and document in the vector space. In this paper, the kd-tree searching algorithm is used within a recent LSI implementation to reduce the time and computational complexity of query matching. The kd-tree data structure stores the term and document vectors in such a way that only those terms and documents that are most likely to qualify as nearest neighbors to the query will be examined and retrieved.  相似文献   

16.
基于内容的图像检索技术是对图像的物理内容为加工对象的检索技术之一,主要实现方式包括基于颜色、纹理、形状、空间位置和语义等。其中基于颜色的图像检索发展最为成熟,而基于语义的检索则尚处于探讨、研究阶段。基于内容检索和基于文本检索在数字图书馆中可以融合共同提供检索服务。Google为这一尝试提供了在后控阶段的有效案例,而真正的实现两者的融合是在预处理阶段。两者结合在数字图书馆中的应用是可行的,相信能够提供更好的图像检索服务。  相似文献   

17.
The effects of query structures and query expansion (QE) on retrieval performance were tested with a best match retrieval system (InQuery1). Query structure means the use of operators to express the relations between search keys. Six different structures were tested, representing strong structures (e.g., queries with facets or concepts identified) and weak structures (no concepts identified, a query is a bag of search keys). QE was based on concepts, which were first selected from a searching thesaurus, and then expanded by semantic relationships given in the thesaurus. The expansion levels were (a) no expansion, (b) a synonym expansion, (c) a narrower concept expansion, (d) an associative concept expansion, and (e) a cumulative expansion of all other expansions. With weak structures and Boolean structured queries, QE was not very effective. The best performance was achieved with a combination of a facet structure, where search keys within a facet were treated as instances of one search key (the SYN operator), and the largest expansion.  相似文献   

18.
Vocabulary incompatibilities arise when the terms used to index a document collection are largely unknown, or at least not well-known to the users who eventually search the collection. No matter how comprehensive or well-structured the indexing vocabulary, it is of little use if it is not used effectively in query formulation. This paper demonstrates that techniques for mapping user queries into the controlled indexing vocabulary have the potential to radically improve document retrieval performance. We also show how the use of controlled indexing vocabulary can be employed to achieve performance gains for collection selection. Finally, we demonstrate the potential benefit of combining these two techniques in an interactive retrieval environment. Given a user query, our evaluation approach simulates the human user's choice of terms for query augmentation given a list of controlled vocabulary terms suggested by a system. This strategy lets us evaluate interactive strategies without the need for human subjects.  相似文献   

19.
中文自动标引是图书馆学情报学界多年研究的问题并取得了一定成果,其在信息检索数据库研究领域不可或缺。随着全文检索和中文搜索引擎的盛行,中文信息处理有多个学科涉及。中文自动标引、全文检索和中文搜索引擎是什么关系有必要加以明确,以确定其在中文信息处理领域的地位。经探讨认为,全文检索利用了中文自动标引的各种方式,搜索引擎利用了全文检索,因此搜索引擎利用了中文自动标引。中文自动标引、全文检索及中文搜索引擎三者关系是中文自动标引被利用和在技术发展方面相互促进的关系。  相似文献   

20.
通过对近年来计算机科学、人工智能、专利文献加工等领域的发展进行总结,从多语言混合检索、分类检索、语义检索、图像检索以及辅助技术五个方面介绍专利文献计算机检索技术的最新发展。机器翻译技术和多边共同分类体系的完善有助于提高计算机检索效率、消除语言障碍,而语义检索、图像检索和文献自动处理技术的发展有望使面向不同层次用户的计算机智能化检索系统得以实现。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号