首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 205 毫秒
1.
关于顺排检索和倒排检索的并行化探讨   总被引:1,自引:1,他引:0  
并行信息检索研究领域的开辟,使快速查询检索系统中的大容量数据库和有效缩短检索响应时间成为可能。然而,欲有效地实现并行检索,需要针对信息检索的特点考虑和解决许多问题。本文将对顺序文档检索和倒排文档检索的并行实现方法作一探讨,从而在其所依赖的并行计算机体系结构、并行语言与编译程序及并行操作系统和并行检索算法等方面提出一些初步思路和设想。  相似文献   

2.
本文通过加权检索与逻辑检索的比较分析和引入负值权数的概念, 论述了在计算机情报检索中加权法提问式与提问逻辑式在表达意义上的可等价性, 并提出了合理确定权数的一种简便方法。文中还侧重论述了加权检索在采用顺排文档与倒排文档检索中的实现方法。  相似文献   

3.
信息检索的数据并行性研究   总被引:1,自引:0,他引:1  
信息检索的并行性研究包括数据并行和功能并行,而数据并行可表现为SIMD系统中的数据级并行以及分布式系统或MIMD系统中的数据集并行。本文讨论数据级并行检索和数据库的分布式并行检索两种方式,并在二者间作一简单比较。  相似文献   

4.
针对信息检索角度的XML的结构化检索问题,利用基于倒排文件的方法,使用NEXI作为检索语言,在基于XML的数字图书馆检索实验系统WHU-XML上对其进行实现,并具体分析查询语言的解析方法以及所采用的结构化检索算法。  相似文献   

5.
本文提出分离重复字段和实现快速检索的方法,讨论两种倒排文档的时空效益及互相转换问题,在DBASEⅢ上用过波兰变换和横式集合算法实现重复字段倒排文档的检索。  相似文献   

6.
本文对dBASE—Ⅲ存在的某些局限性,即不能对多主题字段进行有效的检索以及缺少组配检索功能进行了探索研究,并给出了在dBASE-Ⅲ下实现布尔逻辑检索的算法.该算法以逆波兰方法为基础,并从dBASE-Ⅲ本身存在的优劣出发,对原逆波兰算法作了许多改进,以利于dBASE-Ⅲ对新算法的实现。  相似文献   

7.
倒排文档检索的优化算法探讨   总被引:1,自引:0,他引:1  
本文就倒排文档检索提出一种新颖的算法──二项拆分法。此法直接根据运算项的运算先后次序进行检索,避开了常用的福岛算法,对倒排文档检索算法的改进与优化作了一种崭新的尝试。  相似文献   

8.
基于本体的知识检索研究及实现   总被引:10,自引:0,他引:10  
张佩云  孙亚民  吴江 《情报学报》2006,25(5):553-558
本文提出了一种基于本体的知识检索框架。在该框架中,知识检索主要由语义检索和基于规则的推理检索两部分组成。通过对检索方法的研究分析和算法实现,开发了一个基于本体的文档知识管理系统,并由实例对基于本体的知识检索性能予以验证。结果表明,该检索系统具有一定的智能,较好地解决了知识的重用和共享问题。  相似文献   

9.
认为利用关键词对结构化数据进行查询,实现信息检索和数据库查询的融合的技术已成为热点研究问题。基于模式图的检索算法是目前数据库关键词检索研究的技术之一。现有的模式图算法仍然存在着检索效率低下、查询准确率不高等问题。在对现有算法进行改进的基础上,设计并实现一个基于改进算法的系统,实验表明,使用改进算法的系统具有更高的检索性能和检索效率。  相似文献   

10.
针对当前跨媒体检索算法没有充分利用不同媒体特征之间的潜在语义关联和无法解决跨媒体检索过程中的维度灾难与语义鸿沟问题,研究并设计基于语义关联挖掘的跨媒体检索算法。该算法主要由语义关联挖掘、跨媒体本体动态构建、跨媒体语义相似度计算三个部分组成。研究表明,该算法能够有效地提升跨媒体检索的准确率和效率,能够在一定程度上满足用户跨媒体检索的需求。  相似文献   

11.
倒排文档是信息检索系统中最普遍使用的索引机制,而索引文件的压缩能大大提高检索速度和节约磁盘空间。倒排文件压缩的传统做法是文档(标识号)间距法(d-gaps)。然而,剧烈变化的间距值并不能被著名的前缀自由代码有效编码压缩。为了使间距值得到有效的压缩,本文设计了一个文档标识号重置法。模拟试验表明能更有效压缩d-gaps倒排文档。  相似文献   

12.
As computer applications have become more sophisticated, they have become rather data intensive. Such applications suffer from inadequate use of parallelism for processing data stored on secondary storage devices. Devices such as database machines are useful in some applications, but many applications are too small or specialized to make use of database machine technology.To bridge this gap, we have introduced a parallel file system. The parallel file system is capable of acting as either an SIMD machine or an MIMD machine depending on the file type.In the present work, we describe two approaches to enhance performance in the parallel file system. First, we examine a strategy of initiating partial searches when full searches for concurrent file usage are not possible. As a second level, a relocation algorithm has been designed to move selected portions of files (subfiles) to improve the degree of parallelism in the multiprogramming environment. The relocation algorithm makes use of a cost function based on the level of sharing between files to determine the best place to relocate the subfiles.  相似文献   

13.
Adding Compression to Block Addressing Inverted Indexes   总被引:8,自引:1,他引:7  
Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for efficient, low-overhead text retrieval. Modern text compression techniques can reduce the text to less than 30% of its size and allow searching it directly and faster than the uncompressed text. Inverted index compression obtains significant reduction of its original size at the same processing speed. Block addressing makes the inverted lists point to text blocks instead of exact positions and pay the reduction in space with some sequential text scanning.In this work we combine the three ideas in a single scheme. We present a compressed inverted file that indexes compressed text and uses block addressing. We consider different techniques to compress the index and study their performance with respect to the block size. We compare the index against three separate techniques for varying block sizes, showing that our index is superior to each isolated approach. For instance, with just 4% of extra space overhead the index has to scan less than 12% of the text for exact searches and about 20% allowing one error in the matches.  相似文献   

14.
在电子文件管理元数据中,主题元素存在着3种语义结构:无级次语义结构,有级次语义结构,多级次语义结构。由此也就形成3种不同的XML语法结构。基于XML电子文件管理元数据的主题元素的语义结构设计,不能忽视XML检索功能的需求分析。表8。参考文献5。  相似文献   

15.
 Winisis是一个先进的信息存储与检索软件,由联合国教科文组织开发、维护和免费向世界各国推广,由于采用独特的数据库结构和倒排文档等关键技术,它在可变长记录处理和快速检索等方面具有很强的功能。本文对该软件的关键技术进行了深入研究和剖析。  相似文献   

16.
在信息技术中,将每条记录的中文可检词或标题、内容经过独特处理,做成“全息压缩码”做为唯一标识。该技术运用在计算机文献检索系统中可大大提高检索速度,优化空间配置,广泛用于倒排文档、查重、记录对比等多个领域。  相似文献   

17.
Various parallel logical inference algorithms based on the resolution principle are studied. Experimental study was performed on computer systems with shared memory and a cluster. The results we describe show how the architecture and features of computer systems, granularity of parallelism and heuristics influence the efficiency of parallel inference.  相似文献   

18.
基于文档结构的向量空间检索模型研究   总被引:9,自引:0,他引:9  
韩毅 《情报学报》2004,23(2):158-162
分析了传统向量空间检索模型在网络信息检索中的不足 ,给出了基于文档结构的向量空间检索模型。该模型将文档在逻辑上分成N段 ,依据特征项对文档内容代表能力的不同 ,选择有限的最能代表逻辑段内容的特征项构造文本逻辑段的特征项向量与权值向量 ,并以此为基础计算文档与提问的匹配相似度值 ,从而决定匹配文档的检出与排列顺序。进行了两种模型算法时间复杂度的比较分析 ,讨论了改进模型的可能应用前景和存在问题。  相似文献   

19.
Distributed memory information retrieval systems have been used as a means of managing the vast volume of documents in an information retrieval system, and to improve query response time. However, proper allocation of documents plays an important role in improving the performance of such systems. Maximising the amount of parallelism can be achieved by distributing the documents, while the inter-node communication cost is minimised by avoiding documents distribution. Unfortunately, these two factors contradict each other. Finding an optimal allocation satisfying the above objectives is referred to as distributed memory document allocation problem (DDAP), and it is an NP-Complete problem. Heuristic algorithms are usually employed to find an optimal solution to this problem. Genetic algorithm is one such algorithms. In this paper, a genetic algorithm is developed to find an optimal document allocation for DDAP. Several well-known network topologies are investigated to evaluate the performance of the algorithm. The approach relies on the fact that documents of an information retrieval system are clustered by some arbitrary method. The advantages of a clustered document approach specially in a distributed memory information retrieval system are well-known.Since genetic algorithms work with a set of candidate solutions, parallelisation based on a Single Instruction Multiple Data (SIMD) paradigm seems to be the natural way to obtain a speedup. Using this approach, the population of strings is distributed among the processing elements. Each string is processed independently. The performance gain comes from the parallel execution of the strings, and hence, it is heavily dependent on the population size. The approach is favoured for genetic algorithms' applications where the parameter set for a particular run is well-known in advance, and where such applications require a big population size to solve the problem. DDAP fits nicely into the above requirements. The aim of the parallelisation is two-fold: the first one is to speedup the allocation process in DDAP which usually consists of thousands of documents and has to use a big population size, and second, it can be seen as an attempt to port the genetic algorithm's processes into SIMD machines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号