首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a size reduction method for the inverted file, the most suitable indexing structure for an information retrieval system (IRS). We notice that in an inverted file the document identifiers for a given word are usually clustered. While this clustering property can be used in reducing the size of the inverted file, good compression as well as fast decompression must both be available. In this paper, we present a method that can facilitate coding and decoding processes for interpolative coding using recursion elimination and loop unwinding. We call this method the unique-order interpolative coding. It can calculate the lower and upper bounds of every document identifier for a binary code without using a recursive process, hence the decompression time can be greatly reduced. Moreover, it also can exploit document identifier clustering to compress the inverted file efficiently. Compared with the other well-known compression methods, our method provides fast decoding speed and excellent compression. This method can also be used to support a self-indexing strategy. Therefore our research work in this paper provides a feasible way to build a fast and space-economical IRS.  相似文献   

2.
    
Let X=x1,x2,…,xnX=x1,x2,,xn be a sequence of non-decreasing integer values. Storing a compressed representation of X that supports access and search is a problem that occurs in many domains. The most common solution to this problem uses a linear list and encodes the differences between consecutive values with encodings that favor small numbers. This solution includes additional information (i.e. samples) to support efficient searching on the encoded values. We introduce a completely different alternative that achieves compression by encoding the differences in a search tree. Our proposal has many applications, such as the representation of posting lists, geographic data, sparse bitmaps, and compressed suffix arrays, to name just a few. The structure is practical and we provide an experimental evaluation to show that it is competitive with the existing techniques.  相似文献   

3.
Analysis of arithmetic coding for data compression   总被引:1,自引:0,他引:1  
Arithmetic coding, in conjunction with a suitable probabilistic model, can provide nearly optimal data compression. In this article we analyze the effect that the model and the particular implementation of arithmetic coding have on the code length obtained. Periodic scaling is often used in arithmetic coding implementations to reduce time and storage requirements, it also introduces a recency effect which can further affect compression. Our main contribution is introducing the concept of weighted entropy and using it to characterize in an elegant way the effect that periodic scaling has on the code length. We explain why and by how much scaling increases the code length for files with a homogeneous distribution of symbols, and we characterize the reduction in code length due to scaling for files exhibiting locality of reference. We also give a rigorous proof that the coding effects of rounding scaled weights, using integer arithmetic, and encoding end-of-file are negligible.  相似文献   

4.
    
We present a new variable-length encoding scheme for sequences of integers, Directly Addressable Codes (DACs), which enables direct access to any element of the encoded sequence without the need of any sampling method. Our proposal is a kind of implicit data structure that introduces synchronism in the encoded sequence without using asymptotically any extra space. We show some experiments demonstrating that the technique is not only simple, but also competitive in time and space with existing solutions in several applications, such as the representation of LCP arrays or high-order entropy-compressed sequences.  相似文献   

5.
    
In-memory nearest neighbor computation is a typical collaborative filtering approach for high recommendation accuracy. However, this approach is not scalable given the huge number of customers and items in typical commercial applications. Cluster-based collaborative filtering techniques can be a remedy for the efficiency problem, but they usually provide relatively lower accuracy figures, since they may become over-generalized and produce less-personalized recommendations. Our research explores an individualistic strategy which initially clusters the users and then exploits the members within clusters, but not just the cluster representatives, during the recommendation generation stage. We provide an efficient implementation of this strategy by adapting a specifically tailored cluster-skipping inverted index structure. Experimental results reveal that the individualistic strategy with the cluster-skipping index is a good compromise that yields high accuracy and reasonable scalability figures.  相似文献   

6.
ASP.NET是微软公司的最新一种网页编程代码,它已逐渐成为网站建设者首选的网页编程语言。ASP.NET中连接数据库和操作数据库的就是ADO.NET,ADO.NET是与数据源交互的.NET技术。本文将通过分析ADO.NET,研究其在网站建设中的应用。  相似文献   

7.
报道了一种新的活动图象编码算法,称为具有运动补偿的多值量化块截断编码.模拟结果表明:这种新算法比现有的其它同类算法有更好的性能,压缩比为75,信噪比为35dB,在现有个人计算机上可以用软件实时实现.  相似文献   

8.
朱少强 《情报科学》2003,21(6):670-671
本文介绍了元数据在计算机软件中的应用,微软ADO数据访问技术对元数据概念的支持,并以一个编程实例演示了在MIS开发中应用元数据概念扩展ADO数据访问的技巧。  相似文献   

9.
In this paper, we propose a common phrase index as an efficient index structure to support phrase queries in a very large text database. Our structure is an extension of previous index structures for phrases and achieves better query efficiency with modest extra storage cost. Further improvement in efficiency can be attained by implementing our index according to our observation of the dynamic nature of common word set. In experimental evaluation, a common phrase index using 255 common words has an improvement of about 11% and 62% in query time for the overall and large queries (queries of long phrases) respectively over an auxiliary nextword index. Moreover, it has only about 19% extra storage cost. Compared with an inverted index, our improvement is about 72% and 87% for the overall and large queries respectively. We also propose to implement a common phrase index with dynamic update feature. Our experiments show that more improvement in time efficiency can be achieved.  相似文献   

10.
    
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the vocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both effectiveness and efficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) retrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, retrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.  相似文献   

11.
将 Julia曲线\"按正方形形状以多种方式进行量化,并将量化的 Julia曲线 \"用于分形图像压缩编码,改变了分形图像压缩编码以变化的压缩编码字典进行编码的缺点。此外,还建立了一个小型的常用字典,用以加速分形图像的压缩编码。实验结果表明, Julia曲线 \"能很好地拼贴所要编码的图像,并具有分形图像的解码优点。  相似文献   

12.
Though many compression methods are based on the use of variable length codes, there has recently been a trend to search for alternatives in which the lengths of the codewords are more restricted, which can have useful applications, such as easier processing and fast decoding. This paper explores the construction of variable-to-fixed length codes, which have been suggested long ago by Tunstall. Using new heuristics based on suffix trees, the performance of Tunstall codes can in some cases be improved by more than 40%.  相似文献   

13.
【目的】更好地了解我国科技期刊相关引证指标数据库的情况。【方法】对科技期刊引证指标数据库(CSCD-JCR)、《中国科技期刊引证报告(核心版)》(CJCR)、万方数据的期刊统计分析与评价、维普资讯的中文科技期刊评价报告、中国知网的个刊影响力统计分析数据库这5个科技期刊引证指标数据库进行对比分析,详细分析比较了其引证指标组成及用于评价科技期刊的功能。【结果】由于统计数据源不同,各引证指标数据库的引证指标数据差距较大。【结论】用综合评价指标对期刊排名较单一指标排名能更准确反映期刊的学术水平和行业影响力;基于大型数据库的引证指标能更准确、全面地反映期刊被引用和传播情况,但基于源期刊的CSCD-JCR和CJCR引证指标更具有代表性,能更准确反映期刊的学术影响力和学科地位。  相似文献   

14.
小波变换和多级树集合分裂算法(SPIHT)在合成孔径雷达(SAR)图像压缩方面取得了良好的效果,但SPIHT编码方法的复杂性制约了压缩速率的提高.针对SPIHT编码速度慢和占用内存大的问题,提出一种改进的无链表SPIHT算法,以提高编码运算速度,减少资源占用量,使其适于硬件实现.实验结果表明,该方法能达到与原算法相同的压缩效果,而运算速度大大提高,适于实时实现.  相似文献   

15.
    
Nowadays, access to information requires managing multimedia databases effectively, and so, multi-modal retrieval techniques (particularly images retrieval) have become an active research direction. In the past few years, a lot of content-based image retrieval (CBIR) systems have been developed. However, despite the progress achieved in the CBIR, the retrieval accuracy of current systems is still limited and often worse than only textual information retrieval systems. In this paper, we propose to combine content-based and text-based approaches to multi-modal retrieval in order to achieve better results and overcome the lacks of these techniques when they are taken separately. For this purpose, we use a medical collection that includes both images and non-structured text. We retrieve images from a CBIR system and textual information through a traditional information retrieval system. Then, we combine the results obtained from both systems in order to improve the final performance. Furthermore, we use the information gain (IG) measure to reduce and improve the textual information included in multi-modal information retrieval systems. We have carried out several experiments that combine this reduction technique with a visual and textual information merger. The results obtained are highly promising and show the profit obtained when textual information is managed to improve conventional multi-modal systems.  相似文献   

16.
Similarity calculations and document ranking form the computationally expensive parts of query processing in ranking-based text retrieval. In this work, for these calculations, 11 alternative implementation techniques are presented under four different categories, and their asymptotic time and space complexities are investigated. To our knowledge, six of these techniques are not discussed in any other publication before. Furthermore, analytical experiments are carried out on a 30 GB document collection to evaluate the practical performance of different implementations in terms of query processing time and space consumption. Advantages and disadvantages of each technique are illustrated under different querying scenarios, and several experiments that investigate the scalability of the implementations are presented.  相似文献   

17.
利用ADO.NET提供的方法和技术在某医疗福利网站中实现数据库访问和连接部分的功能,并根据实际情况对这部分功能进行优化。在实际应用方面对ADO.NET技术进行更深一步的探讨。  相似文献   

18.
【目的】研究中国科技期刊近年发展状况。【方法】 依据2008—2014年版《中国科技期刊引证报告(扩刊版)》数据,研究了中国科技期刊主要计量指标均值的变化,并与前人研究核心期刊2008—2013年主要计量指标均值变化的部分结果进行了对比。【结果】 表明扩刊版刊均海外论文比、总被引频次和机构分布数增幅较大,而影响因子、他引率、作者数和地区分布数总体变化不大。部分指标增长趋势不稳定,如影响因子、海外论文比、即年指标和基金论文比在2010年达到次高后有较大幅度下降。与核心期刊主要均值指标变化相比,影响因子、即年指标两者增速相差不大,而扩刊来源文献量增长速度要远高于核心期刊,中国新增发表的科技论文绝大部分发表在非核心期刊。【结论】 中国科技期刊整体在向前发展,对于海外作者的吸引力也越来越大,但其发展的质量不高。  相似文献   

19.
    
Industry 4.0 and the associated IoT and data applications are evolving rapidly and expand in various fields. Industry 4.0 also manifests in the farming sector, where the wave of Agriculture 4.0 provides multiple opportunities for farmers, consumers and the associated stakeholders. Our study presents the concept of Data Sharing Agreements (DSAs) as an essential path and a template for AI applications of data management among various actors. The approach we introduce adopts design science principles and develops role-based access control based on AI techniques. The application is presented through a smart farm scenario while we incrementally explore the data sharing challenges in Agriculture 4.0. Data management and sharing practices should enforce defined contextual policies for access control. The approach could inform policymaking decisions for role-based data management, specifically the data-sharing agreements in the context of Industry 4.0 in broad terms and Agriculture 4.0 in specific.  相似文献   

20.
This paper evaluates the impact of varying implementation of electronic lab order entry management (eLAB) system strategies on hospitals’ productivity in the short run. Using the American Hospital Association's Annual Surveys for 2005–2008, we developed hospital productivity measures to assess facilities’ relative performances upon implementing eLAB systems. The results indicate that different eLAB system implementation strategies were systematically related to changes in hospitals’ relative productivity levels over the years studied. Hospitals that partially implemented an eLAB system without completing the roll-out experienced negative impacts on productivity. The greatest loss in short-term productivity was experienced by facilities that moved from having no eLAB system to a complete implantation in one year—a strategy called the “Big Bang”. The hybrid approach of a limited introduction in one period followed by complete roll-out in the next year was the only eLAB system implementation strategy associated with significant productivity gains. Our findings support a very specific strategy for eLAB system implementation where facilities began with a one-year pilot program immediately followed by an organization-wide implementation effort in the next period.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号