首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到4条相似文献,搜索用时 0 毫秒
1.
This paper presents a size reduction method for the inverted file, the most suitable indexing structure for an information retrieval system (IRS). We notice that in an inverted file the document identifiers for a given word are usually clustered. While this clustering property can be used in reducing the size of the inverted file, good compression as well as fast decompression must both be available. In this paper, we present a method that can facilitate coding and decoding processes for interpolative coding using recursion elimination and loop unwinding. We call this method the unique-order interpolative coding. It can calculate the lower and upper bounds of every document identifier for a binary code without using a recursive process, hence the decompression time can be greatly reduced. Moreover, it also can exploit document identifier clustering to compress the inverted file efficiently. Compared with the other well-known compression methods, our method provides fast decoding speed and excellent compression. This method can also be used to support a self-indexing strategy. Therefore our research work in this paper provides a feasible way to build a fast and space-economical IRS.  相似文献   

2.
Analysis of arithmetic coding for data compression   总被引:1,自引:0,他引:1  
Arithmetic coding, in conjunction with a suitable probabilistic model, can provide nearly optimal data compression. In this article we analyze the effect that the model and the particular implementation of arithmetic coding have on the code length obtained. Periodic scaling is often used in arithmetic coding implementations to reduce time and storage requirements, it also introduces a recency effect which can further affect compression. Our main contribution is introducing the concept of weighted entropy and using it to characterize in an elegant way the effect that periodic scaling has on the code length. We explain why and by how much scaling increases the code length for files with a homogeneous distribution of symbols, and we characterize the reduction in code length due to scaling for files exhibiting locality of reference. We also give a rigorous proof that the coding effects of rounding scaled weights, using integer arithmetic, and encoding end-of-file are negligible.  相似文献   

3.
Let X=x1,x2,…,xnX=x1,x2,,xn be a sequence of non-decreasing integer values. Storing a compressed representation of X that supports access and search is a problem that occurs in many domains. The most common solution to this problem uses a linear list and encodes the differences between consecutive values with encodings that favor small numbers. This solution includes additional information (i.e. samples) to support efficient searching on the encoded values. We introduce a completely different alternative that achieves compression by encoding the differences in a search tree. Our proposal has many applications, such as the representation of posting lists, geographic data, sparse bitmaps, and compressed suffix arrays, to name just a few. The structure is practical and we provide an experimental evaluation to show that it is competitive with the existing techniques.  相似文献   

4.
Industry 4.0 and the associated IoT and data applications are evolving rapidly and expand in various fields. Industry 4.0 also manifests in the farming sector, where the wave of Agriculture 4.0 provides multiple opportunities for farmers, consumers and the associated stakeholders. Our study presents the concept of Data Sharing Agreements (DSAs) as an essential path and a template for AI applications of data management among various actors. The approach we introduce adopts design science principles and develops role-based access control based on AI techniques. The application is presented through a smart farm scenario while we incrementally explore the data sharing challenges in Agriculture 4.0. Data management and sharing practices should enforce defined contextual policies for access control. The approach could inform policymaking decisions for role-based data management, specifically the data-sharing agreements in the context of Industry 4.0 in broad terms and Agriculture 4.0 in specific.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号