共查询到20条相似文献,搜索用时 8 毫秒
1.
Inverted Index Compression Using Word-Aligned Binary Codes 总被引:3,自引:1,他引:3
We examine index representation techniques for document-based inverted files, and present a mechanism for compressing them using word-aligned binary codes. The new approach allows extremely fast decoding of inverted lists during query processing, while providing compression rates better than other high-throughput representations. Results are given for several large text collections in support of these claims, both for compression effectiveness and query efficiency. 相似文献
2.
3.
Compressing Inverted Files 总被引:2,自引:0,他引:2
Andrew Trotman 《Information Retrieval》2003,6(1):5-19
Research into inverted file compression has focused on compression ratio—how small the indexes can be. Compression ratio is important for fast interactive searching. It is taken as read, the smaller the index, the faster the search.The premise smaller is better may not be true. To truly build faster indexes it is often necessary to forfeit compression. For inverted lists consisting of only 128 occurrences compression may only add overhead. Perhaps the inverted list could be stored in 128 bytes in place of 128 words, but it must still be stored on disk. If the minimum disk sector read size is 512 bytes and the word size is 4 bytes, then both the compressed and raw postings would require one disk seek and one disk sector read. A less efficient compression technique may increase the file size, but decrease load/decompress time, thereby increasing throughput.Examined here are five compression techniques, Golomb, Elias gamma, Elias delta, Variable Byte Encoding and Binary Interpolative Coding. The effect on file size, file seek time, and file read time are all measured as is decompression time. A quantitative measure of throughput is developed and the performance of each method is determined. 相似文献
4.
高校图书馆数据库用户满意指数模型—假设与检验 总被引:1,自引:0,他引:1
5.
信息资源索引数据库的研究 总被引:5,自引:0,他引:5
信息资源索引是信息利用的基础,索引数据库应运而生。从单机系统到国际联机检索,开创了信息索引技术高度发展与广泛应用的新时代。因特网的迅猛发展使搜索引擎已成为互联网上的新兴产业。本文重点讨论了网络信息资源索引数据库的设计、建立与利用问题,展望了未来索引数据库的发展趋势 相似文献
6.
Simple Bayesian Model for Bitmap Compression 总被引:1,自引:1,他引:0
Bitmaps are a useful, but storage voracious, component of many information retrieval systems. Earlier efforts to compress bitmaps were based on models of bit generation, particularly Markov models. While these permitted considerable reduction in storage, the short memory of Markov models may limit their compression efficiency. In this paper we accept the state orientation of Markov models, but introduce a Bayesian approach to assess the state; the analysis is based on data accumulating in a growing window. The paper describes the details of the probabilistic assumptions governing the Bayesian analysis, as well as the protocol for controlling the window that receives the data. We find slight improvement over the best performing strictly Markov models. 相似文献
7.
综合性文献数据库用户心智模型理论问题初探 总被引:1,自引:0,他引:1
首先,从多个学科视角对心智模型的概念进行分析。其次,分析心智模型引入到情报学信息行为研究领域的原因,并对心智模型与信息行为研究的一些核心范式间的关系进行探索,从而确立在信息行为情境下分析用户心智模型的理论基础。最后,对我国综合性文献数据库用户心智模型的形成和构成进行分析,以为今后从定量化角度测量心智模型提供理论支持。 相似文献
8.
9.
10.
Matching Index Expressions for Information Retrieval 总被引:6,自引:0,他引:6
The INN system is a dynamic hypertext tool for searching and exploring the WWW. It uses a dynamically built ancillary layer to support easy interaction. This layer features the subexpressions of index expressions that are extracted from rendered documents. Currently, the INN system uses keyword based matching. The effectiveness of the INN system may be increased by using matching functions for index expressions. In the design of such functions, several constraints stemming from the INN must be taken into account. Important constraints are a limited response time and storage space, a focus on discriminating (different notions of) subexpressions for index expressions, and domain independency. With these contextual constraints in mind, several matching functions are designed and both theoretically and practically evaluated. 相似文献
11.
《Public Services Quarterly》2013,9(2):15-22
Abstract Well-chosen keywords in titles are significant in enabling optimal document retrieval. Title keyword searches employing the natural language of the researcher augment controlled vocabulary searches. Authors and researchers interested in a particular topic share a vocabulary that contains keywords useful in database searching. It is important for authors to incorporate such keywords in their titles. Both author and researcher will benefit if titles facilitate electronic access. Librarians can assist in educating authors on the benefits of using distinctive and selective keywords in titles by making guidelines available. 相似文献
12.
一个基于XML的引文索引模型设计及其实现 总被引:5,自引:1,他引:5
本文概述了引文索引在科学研究、信息检索和技术开发等方面的独特利用价值 ,结合当今Web上数据表示和数据交换的新标准XML的优势 ,提出一个基于XML标记语言的引文索引模型 ,在此模型的基础上提出引文索引构建、检索和引文分析的实现方法 相似文献
13.
Nadia Zavorotna 《Slavic & East European Information Resources》2013,14(2-3):192-198
In this article, the author discusses the creation of an electronic index of scholarly Slavic periodicals in the humanities, which will launch in early 2012. The aim of this project is to create a standard electronic reference tool in the field of Central, Eastern, and Southeastern European Studies, which will help professors, students, and researchers. The index contains important scholarly journals from Belarus, Bulgaria, Croatia, the Czech Republic, Macedonia, Poland, Serbia, Slovakia, Slovenia, and Ukraine. Indexing begins with 1994 issues and is ongoing. In the future, significant retrospective journals will be indexed in order to improve the tool's research capabilities. The index contains not only articles, but also all book reviews and information on conferences, workshops, organizations, and foundations. The index currently contains citations of over 125,000 articles from more than 143 Slavic journals in the humanities. The use of the Library of Congress transliteration scheme and subject headings will assist users to perform effective searches. 相似文献
14.
15.
具有集中索引的数字图书馆系统结构 总被引:8,自引:0,他引:8
数字图书馆是基于Internet的分布式信息系统 ,结构设计是影响系统可靠性和性能的关键因素之一。本文描述数字图书馆系统的基本结构 ,通过实例阐述并分析了“完全的分布式结构”和“集中索引的分布式结构”两种数字图书馆系统结构设计。“集中索引的分布式结构”可提供较好的系统可靠性和系统性能 相似文献
16.
在简要说明建立索引数据库质量评价标准的必要性和紧迫性的基础上,结合国内外现有的索引质量标准,提出了我国索引数据库的质量评价基准,并以全国高校专题特色数据库的质量评价体系为参考构建了我国索引数据库的质量评价标准的内容(包括通用标准和专用标准),最后就索引数据库质量评价问题提出了自己的续想。 相似文献
17.
网上学术资源评价指标研究 总被引:3,自引:0,他引:3
本文对网上学术信息资源的评价指标从内容、使用、设计等方面进行了全面系统的整理和研究,分析了所存在的问题。在些基础上,提出了选择评价指标的指导性原则,包括一致性、客观性、可测性、实用性和适用性五个原则。 相似文献
18.
科学技术活动的指标、数学模型等问题是一个讨论多年而未解决的问题,本文从科学技术活动的投入产出角度提出了相应的指标系列;并从科学技术投入产出的一因一果关系、多因一果关系、多因多果关系、线性与非线性关系等方面提出了相应的数学模型及定义定理;最后,根据这些定义定理,提出了科学技术发展在正常时期 (继承发展时期)和非常时期(转化、革命时期)的有关推论。 相似文献
19.
论审评学术论文创新因素的指标体系 总被引:8,自引:1,他引:7
学术论文创新是指论文在相关学术领域内所提供的知识与现有文献存在有价值的不同.以此为逻辑起点,构建了一个审评学术论文创新因素的指标体系,并为具体运用该指标体系设计了工作流程. 相似文献