首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users.  相似文献   

3.
Online information retrieval systems continued to reach wider audiences. The authors discuss a particular text retrieval system and its techniques for helping the common unsophisticated user through both the search for and understanding of information based on the vocabular file concept. In addition methods for easy construction and maintenance of a suuitable data base organization are described.  相似文献   

4.
Hierarchical naming schemes have been adopted in a number of distributed filestore designs. These schemes usually fall into one of two categories. In the first, there is a single namespace, with a fixed root, spread over all the machines in the distributed system. The user is unaware of any connection between the logical name and the physical location of a file. In the second, the total namespace is composed of some aggregation of the individual namespaces of the filestores of each component system. In this paper, it is argued that the two categories can coexist in a single distributed computer system. A scheme is described for providing highly available files, using replicated copies, within a UNIX United namespace, which is indefinitely extensible. The reliable files are accessed in exactly the same way as normal files.  相似文献   

5.
6.
7.
8.
Search patterns of documents and information requests are their better or worse representatives only, so it is important to carry on examinations on possibilities of designing self-learning information retrieval systems. Another important question is to elaborate such an organization of document search pattern set as to obtain an acceptable response time of the information system to a given information request.A self-learning process of the proposed information system consists in the determination—on a set of document and information request search patterns—of the similarity relation according to L. A. Zadeh.The organization of a set of document search patterns proposed in the paper ensures the limitation of document search pattern set searching process—when retrieving a response to a given information request—to one (or several) subset from previously determined subsets. This makes the information system response time acceptable. The proposed information retrieval strategy is discussed in terms of fuzzy sets.  相似文献   

9.
Users of search engines express their needs as queries, typically consisting of a small number of terms. The resulting search engine query logs are valuable resources that can be used to predict how people interact with the search system. In this paper, we introduce two novel applications of query logs, in the context of distributed information retrieval. First, we use query log terms to guide sampling from uncooperative distributed collections. We show that while our sampling strategy is at least as efficient as current methods, it consistently performs better. Second, we propose and evaluate a pruning strategy that uses query log information to eliminate terms. Our experiments show that our proposed pruning method maintains the accuracy achieved by complete indexes, while decreasing the index size by up to 60%. While such pruning may not always be desirable in practice, it provides a useful benchmark against which other pruning strategies can be measured.  相似文献   

10.
In this work the applicability of magnetic bubble memories for the processing of inverted files has been discussed. Four novel models of magnetic bubble memories are presented to demonstrate the storage structures and the data processing. The first model employs an organization of major-minor loops. On the basis of such organization a uniform ladder is formed so that the data can be rearranged by using four operations (global shift, detached shift, exchange and delta exchange). The second model makes use of the on-chip decoder (also known as self-contained magnetic bubble-domain memory chip). For this model a hashing scheme is relied upon to perform the required data operations. The third and fourth models are different combinations of the first two models. The latter two models may provide a relatively high-speed performance as well as a reasonable system complexity.For each model the algorithms of data retrieval, sorting, deletion, insertion and updating are given. Also, a comparison of the four models has been carried out in order to determine the most convenient magnetic bubble memory structure for the processing of inverted files.  相似文献   

11.
Among the problems associated with modern information retrieval systems is the lack of any systematic approach to the design of query language interfaces. In this paper we attempt to show how a relationally organised data base is well suited to bibliographic data management, and how, given such a relational organisation it is possible to construct an interface which separates the query language from the physical representation of the data base. It is also shown how such a query language organisation may be usefully interfaced to existing retrieval systems. Finally a query language for retrieval applications is proposed.  相似文献   

12.
This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these metrics using the Buckley/Voorhees stability method, the Voorhees/Buckley swap method and Kendall’s rank correlation, with three data sets comprising test collections and submitted runs from NTCIR. Our experiments show that (Average) Normalised Discounted Cumulative Gain at document cut-off l are the best among the rank-based graded-relevance metrics, provided that l is large. On the other hand, if one requires a recall-based graded-relevance metric that is highly correlated with Average Precision, then Q-measure is the best choice. Moreover, these best graded-relevance metrics are at least as stable and sensitive as Average Precision, and are fairly robust to the choice of gain values.  相似文献   

13.
Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each other. However, conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term dependence into a probabilistic retrieval model by adapting a dependency structured indexing system using a dependency parse tree and Chow Expansion to compensate the weakness of the assumption. In this paper, we describe a theoretic process to apply the Chow Expansion to the general probabilistic models and the state-of-the-art 2-Poisson model. Through experiments on document collections in English and Korean, we demonstrate that the incorporation of term dependences using Chow Expansion contributes to the improvement of performance in probabilistic IR systems.  相似文献   

14.
The increasing number of documents that have to be indexed in different environments, particularly on the Web, and the lack of scalability of a single centralised index lead to the use of distributed information retrieval systems to effectively search for and locate the required information. In this study, we present several improvements over the two main bottlenecks in a distributed information retrieval system (the network and the brokers). We extend a simulation network model in order to represent a switched network. The new simulation model is validated by comparing the estimated response times with those obtained using a real system. We show that the use of a switched network reduces the saturation of the interconnection network, especially in a replicated system, and some improvements may be achieved using multicast messages and faster connections with the brokers. We also demonstrate that reducing the partial results sets will improve the response time of a distributed system by 53%, with a negligible probability of changing the system’s precision and recall values. Finally, we present a simple hierarchical distributed broker model that will reduce the response times for a distributed system by 55%.  相似文献   

15.
The present-day guidlines for thesaurus design recommend the two different strategies—the committee and empirical approaches—for identifying candidate terms. An argument is made that the basis for the recommendation is the assumption that the knowledge based on the consensus of experts of a field is different from the knowledge expressed in the literature of that field. An experiment was conducted to test the validity of this assumption. The finding that the two strategies failed to generate the two significantly different lists of terms challenges the validity of the assumption and raises several important questions to the theorists who write the guidelines for thesaurus design and to those who must put the guidelines into practice for design of a thesaurus.  相似文献   

16.
17.
倒立摆是理想的自动控制试验对象,应用模糊控制方法,研究了三级倒立摆系统的稳定控制问题。通过对系统的线性化模型设计LQR最优控制反馈权阵,并基于最优线性控制的反馈参数选择模糊控制参数。仿真结果表明该方法可实现三级倒立摆系统的稳定控制,具有参数选择简单、动态性能较好等特点。  相似文献   

18.
Despite the importance of personalization in information retrieval, there is a big lack of standard datasets and methodologies for evaluating personalized information retrieval (PIR) systems, due to the costly process of producing such datasets. Subsequently, a group of evaluation frameworks (EFs) have been proposed that use surrogates of the PIR evaluation problem, instead of addressing it directly, to make PIR evaluation more feasible. We call this group of EFs, indirect evaluation frameworks. Indirect frameworks are designed to be more flexible than the classic (direct) ones and much cheaper to be employed. However, since there are many different settings and methods for PIR, e.g., social-network-based vs. profile-based PIR, and each needs some special kind of data to do the personalization based on, not all the evaluation frameworks are applicable to all the PIR methods. In this paper, we first review and categorize the frameworks that have already been introduced for evaluating PIR. We further propose a novel indirect EF based on citation networks (called PERSON), which allows repeatable, large-scale, and low-cost PIR experiments. It is also more information-rich compared to the existing EFs and can be employed in many different scenarios. The fundamental idea behind PERSON is that in each document (paper) d, the cited documents are generally related to d from the perspective of d’s author(s). To investigate the effectiveness of the proposed EF, we use a large collection of scientific papers. We conduct several sets of experiments and demonstrate that PERSON is a reliable and valid EF. In the experiments, we show that PERSON is consistent with the traditional Cranfield-based evaluation in comparing non-personalized IR methods. In addition, we show that PERSON can correctly capture the improvements made by personalization. We also demonstrate that its results are highly correlated with those of another salient EF. Our experiments on some issues about the validity of PERSON also show its validity. It is also shown that PERSON is robust w.r.t. its parameter settings.  相似文献   

19.
基于PID算法的旋转倒立摆系统设计   总被引:2,自引:0,他引:2  
王红 《大众科技》2014,(10):25-27
倒立摆系统本身是一个非线性控制系统,具有多变量、高阶次、强耦合以及严重不稳定的特点。主要任务是设计一个基于PID算法的旋转倒立摆控制系统,利用单片机运用PID算法对系统进行控制,能够使旋转倒立摆的摆杆快速达到倒立平衡状态并具有一定的抗干扰能力。  相似文献   

20.
The increasing number of documents to be indexed in many environments (Web, intranets, digital libraries) and the limitations of a single centralised index (lack of scalability, server overloading and failures), lead to the use of distributed information retrieval systems to efficiently search and locate the desired information. This work is a case study of different architectures for a distributed information retrieval system, in order to provide a guide to approximate the optimal architecture with a specific set of resources. We analyse the effectiveness of a distributed, replicated and clustered architecture simulating a variable number of workstations (from 1 up to 4096). A collection of approximately 94 million documents and 1 terabyte (TB) of text is used to test the performance of the different architectures. In a purely distributed information retrieval system, the brokers become the bottleneck due to the high number of local answer sets to be sorted. In a replicated system, the network is the bottleneck due to the high number of query servers and the continuous data interchange with the brokers. Finally, we demonstrate that a clustered system will outperform a replicated system if a high number of query servers is used, essentially due to the reduction of the network load. However a change in the distribution of the users’ queries could reduce the performance of a clustered system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号