首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
Inverted Index Compression Using Word-Aligned Binary Codes   总被引:3,自引:1,他引:3  
We examine index representation techniques for document-based inverted files, and present a mechanism for compressing them using word-aligned binary codes. The new approach allows extremely fast decoding of inverted lists during query processing, while providing compression rates better than other high-throughput representations. Results are given for several large text collections in support of these claims, both for compression effectiveness and query efficiency.  相似文献   

2.
主题标引在文献检索中的作用及提高标引质量的对策   总被引:4,自引:0,他引:4  
孙风梅  曹高芳  李艳芝 《图书馆论坛》2004,24(5):148-149,144
阐述了文献检索与主题标引的关系,从标引质量、标引原则和标引深度等方面探讨了主题标引在文献检索中的作用及提高标引质量的对策。  相似文献   

3.
Compressing Inverted Files   总被引:2,自引:0,他引:2  
Research into inverted file compression has focused on compression ratio—how small the indexes can be. Compression ratio is important for fast interactive searching. It is taken as read, the smaller the index, the faster the search.The premise smaller is better may not be true. To truly build faster indexes it is often necessary to forfeit compression. For inverted lists consisting of only 128 occurrences compression may only add overhead. Perhaps the inverted list could be stored in 128 bytes in place of 128 words, but it must still be stored on disk. If the minimum disk sector read size is 512 bytes and the word size is 4 bytes, then both the compressed and raw postings would require one disk seek and one disk sector read. A less efficient compression technique may increase the file size, but decrease load/decompress time, thereby increasing throughput.Examined here are five compression techniques, Golomb, Elias gamma, Elias delta, Variable Byte Encoding and Binary Interpolative Coding. The effect on file size, file seek time, and file read time are all measured as is decompression time. A quantitative measure of throughput is developed and the performance of each method is determined.  相似文献   

4.
高校图书馆数据库用户满意指数模型—假设与检验   总被引:1,自引:0,他引:1  
通过对市场活动中顾客满意度形成过程与数据库用户满意度形成过程进行对比,指出高校图书馆具有移植顾客满意指数模型进行数据库用户满意度测评的可行性。结合高校图书馆数据库及其用户特点构建高校图书馆数据库用户满意指数模型,设计调查问卷采集数据,运用部分最小二乘法对假设模型进行验证。最后计算出华南师范大学图书馆数据库的用户满意指数。  相似文献   

5.
信息资源索引数据库的研究   总被引:5,自引:0,他引:5  
周宁  林蓉 《情报学报》1999,18(5):639
信息资源索引是信息利用的基础,索引数据库应运而生。从单机系统到国际联机检索,开创了信息索引技术高度发展与广泛应用的新时代。因特网的迅猛发展使搜索引擎已成为互联网上的新兴产业。本文重点讨论了网络信息资源索引数据库的设计、建立与利用问题,展望了未来索引数据库的发展趋势  相似文献   

6.
Simple Bayesian Model for Bitmap Compression   总被引:1,自引:1,他引:0  
Bitmaps are a useful, but storage voracious, component of many information retrieval systems. Earlier efforts to compress bitmaps were based on models of bit generation, particularly Markov models. While these permitted considerable reduction in storage, the short memory of Markov models may limit their compression efficiency. In this paper we accept the state orientation of Markov models, but introduce a Bayesian approach to assess the state; the analysis is based on data accumulating in a growing window. The paper describes the details of the probabilistic assumptions governing the Bayesian analysis, as well as the protocol for controlling the window that receives the data. We find slight improvement over the best performing strictly Markov models.  相似文献   

7.
综合性文献数据库用户心智模型理论问题初探   总被引:1,自引:0,他引:1  
首先,从多个学科视角对心智模型的概念进行分析。其次,分析心智模型引入到情报学信息行为研究领域的原因,并对心智模型与信息行为研究的一些核心范式间的关系进行探索,从而确立在信息行为情境下分析用户心智模型的理论基础。最后,对我国综合性文献数据库用户心智模型的形成和构成进行分析,以为今后从定量化角度测量心智模型提供理论支持。  相似文献   

8.
馆际互借网络数据库服务和管理系统的开发与应用   总被引:3,自引:1,他引:2  
传统的馆际互借模式已无法适应信息时代馆际互借和文献传递的要求。清华大学图书馆结合多年来在馆际互借工作中积累的实践经验,利用互联网络迅速发展的有利条件,开发出基于Web的“清华大学图书馆馆际互借数据库服务和管理系统”。该系统的使用不仅极大地提高了馆际互借工作的效率、质量和服务水平,而且使其更加规范化和现代化。  相似文献   

9.
加强文献数据库及检索刊物的质量控制   总被引:2,自引:0,他引:2  
本文分析了当前文献数据库及检索刊物存在的质量问题,介绍了中国航天文献数据库和《中国导弹与航天文摘》的质量控制措施。包括充分开发软件功能,实现计算机辅助质量控制,相关字段校验,解决二级汉字和多音字排序问题,实现索引自动排序,检索刊物编排自动化。指出建立机构代码数据库和主题词数据库等规范数据库的重要性。  相似文献   

10.
Matching Index Expressions for Information Retrieval   总被引:6,自引:0,他引:6  
The INN system is a dynamic hypertext tool for searching and exploring the WWW. It uses a dynamically built ancillary layer to support easy interaction. This layer features the subexpressions of index expressions that are extracted from rendered documents. Currently, the INN system uses keyword based matching. The effectiveness of the INN system may be increased by using matching functions for index expressions. In the design of such functions, several constraints stemming from the INN must be taken into account. Important constraints are a limited response time and storage space, a focus on discriminating (different notions of) subexpressions for index expressions, and domain independency. With these contextual constraints in mind, several matching functions are designed and both theoretically and practically evaluated.  相似文献   

11.
Abstract

Well-chosen keywords in titles are significant in enabling optimal document retrieval. Title keyword searches employing the natural language of the researcher augment controlled vocabulary searches. Authors and researchers interested in a particular topic share a vocabulary that contains keywords useful in database searching. It is important for authors to incorporate such keywords in their titles. Both author and researcher will benefit if titles facilitate electronic access. Librarians can assist in educating authors on the benefits of using distinctive and selective keywords in titles by making guidelines available.  相似文献   

12.
一个基于XML的引文索引模型设计及其实现   总被引:5,自引:1,他引:5  
黄文  耿继秀 《情报学报》2003,22(2):142-147
本文概述了引文索引在科学研究、信息检索和技术开发等方面的独特利用价值 ,结合当今Web上数据表示和数据交换的新标准XML的优势 ,提出一个基于XML标记语言的引文索引模型 ,在此模型的基础上提出引文索引构建、检索和引文分析的实现方法  相似文献   

13.
In this article, the author discusses the creation of an electronic index of scholarly Slavic periodicals in the humanities, which will launch in early 2012. The aim of this project is to create a standard electronic reference tool in the field of Central, Eastern, and Southeastern European Studies, which will help professors, students, and researchers. The index contains important scholarly journals from Belarus, Bulgaria, Croatia, the Czech Republic, Macedonia, Poland, Serbia, Slovakia, Slovenia, and Ukraine. Indexing begins with 1994 issues and is ongoing. In the future, significant retrospective journals will be indexed in order to improve the tool's research capabilities. The index contains not only articles, but also all book reviews and information on conferences, workshops, organizations, and foundations. The index currently contains citations of over 125,000 articles from more than 143 Slavic journals in the humanities. The use of the Library of Congress transliteration scheme and subject headings will assist users to perform effective searches.  相似文献   

14.
h指数及其用于学术期刊评价   总被引:29,自引:0,他引:29  
由J.E.Hirsch提出的h指数被认为是一个评价科学工作者科学成就的好指标,也能很好地用于学术期刊的评价并可与期刊影响因子优势互补。作为实例,计算了《中华医学杂志》的h指数,强调指出了各种因素对h指数数值的影响。  相似文献   

15.
具有集中索引的数字图书馆系统结构   总被引:8,自引:0,他引:8  
郑彦宁 《情报学报》2001,20(6):642-647
数字图书馆是基于Internet的分布式信息系统 ,结构设计是影响系统可靠性和性能的关键因素之一。本文描述数字图书馆系统的基本结构 ,通过实例阐述并分析了“完全的分布式结构”和“集中索引的分布式结构”两种数字图书馆系统结构设计。“集中索引的分布式结构”可提供较好的系统可靠性和系统性能  相似文献   

16.
在简要说明建立索引数据库质量评价标准的必要性和紧迫性的基础上,结合国内外现有的索引质量标准,提出了我国索引数据库的质量评价基准,并以全国高校专题特色数据库的质量评价体系为参考构建了我国索引数据库的质量评价标准的内容(包括通用标准和专用标准),最后就索引数据库质量评价问题提出了自己的续想。  相似文献   

17.
网上学术资源评价指标研究   总被引:3,自引:0,他引:3  
洪颖 《津图学刊》2003,(2):16-19
本文对网上学术信息资源的评价指标从内容、使用、设计等方面进行了全面系统的整理和研究,分析了所存在的问题。在些基础上,提出了选择评价指标的指导性原则,包括一致性、客观性、可测性、实用性和适用性五个原则。  相似文献   

18.
科学技术活动的指标、数学模型等问题是一个讨论多年而未解决的问题,本文从科学技术活动的投入产出角度提出了相应的指标系列;并从科学技术投入产出的一因一果关系、多因一果关系、多因多果关系、线性与非线性关系等方面提出了相应的数学模型及定义定理;最后,根据这些定义定理,提出了科学技术发展在正常时期 (继承发展时期)和非常时期(转化、革命时期)的有关推论。  相似文献   

19.
论审评学术论文创新因素的指标体系   总被引:8,自引:1,他引:7  
周露阳 《编辑学报》2006,18(1):68-70
学术论文创新是指论文在相关学术领域内所提供的知识与现有文献存在有价值的不同.以此为逻辑起点,构建了一个审评学术论文创新因素的指标体系,并为具体运用该指标体系设计了工作流程.  相似文献   

20.
本文利用Web挖掘的相关方法研究并测度突发事件主题的破坏性。首先对突发事件主题、主题破坏性、破坏特征的维度进行了定义,并构建破坏词数据库,对突发事件主题破坏指数的测度流程进行了详细介绍;然后分别给出单条Web文档和突发事件主题的破坏指数测度方法;最后针对乌鲁木齐7·5打砸事件、非典事件以及汶川地震事件进行实验分析,结果证明,本文所提方法和事件自身表现的破坏程度基本符合。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号