首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Networked information retrieval aims at the interoperability of heterogeneous information retrieval (IR) systems. In this paper, we show how differences concerning search operators and database schemas can be handled by applying data abstraction concepts in combination with uncertain inference. Different data types with vague predicates are required to allow for queries referring to arbitrary attributes of documents. Physical data independence separates search operators from access paths, thus solving text search problems related to noun phrases, compound words and proper nouns. Projection and inheritance on attributes support the creation of unified views on a set of IR databases. Uncertain inference allows for query processing even on incompatible database schemas.  相似文献   

The Multilevel Information System (MLIS), extension of typical Information Retrieval System towards more complete data processing, is discussed. MLIS integrates functions typical for data base management systems and retrieval-oriented systems. Several levels of data accessing are provided, each level developed for a different class of users. End-user level is based on simple query language, trained user level on a relational model, and application programmer level on a Data Manipulation Language nested in high level programming language. The last two levels are discussed in detail.  相似文献   

The data fusion technique has been investigated by many researchers and has been used in implementing several information retrieval systems. However, the results from data fusion vary in different situations. To find out under which condition data fusion may lead to performance improvement is an important issue. In this paper, we present an analysis of the behaviour of several well-known methods such as CombSum and CombMNZ for fusion of multiple information retrieval results. Based on this analysis, we predict the performance of the data fusion methods. Experiments are conducted with three groups of results submitted to TREC 6, TREC 2001, and TREC 2004. The experiments show that the prediction of the performance of data fusion is quite accurate, and it can be used in situations very different from the training examples. Compared with previous work, our result is more accurate and in a better position for applications since various number of component systems can be supported while only two was used previously.  相似文献   

The primary aim of this study is to suggest a formalized definition (“explication”) of “relevance relationship” between texts, including the explication of the concept of “degree of relevance”. The concept of information language (IL), its vocabulry and syntax and the notion of the “semantic power” of an information language are defined. The concept of ideally functioning information retrieval systems (IRS) is suggested and different kinds of deviations from such IRS are considered.  相似文献   

In this paper we present an Information Retrieval System (IRS) which is able to work with structured document collections. The model is based on the influence diagrams formalism: a generalization of Bayesian Networks that provides a visual representation of a decision problem. These offer an intuitive way to identify and display the essential elements of the domain (the structured document components and their usefulness) and also how these are related to each other. They have also associated quantitative knowledge that measures the strength of the interactions. By means of this approach, we shall present structured retrieval as a decision-making problem. Two different models have been designed: SID (Simple Influence Diagram) and CID (Context-based Influence Diagram). The main difference between these two models is that the latter also takes into account influences provided by the context in which each structural component is located.  相似文献   

Measuring effectiveness of information retrieval (IR) systems is essential for research and development and for monitoring search quality in dynamic environments. In this study, we employ new methods for automatic ranking of retrieval systems. In these methods, we merge the retrieval results of multiple systems using various data fusion algorithms, use the top-ranked documents in the merged result as the “(pseudo) relevant documents,” and employ these documents to evaluate and rank the systems. Experiments using Text REtrieval Conference (TREC) data provide statistically significant strong correlations with human-based assessments of the same systems. We hypothesize that the selection of systems that would return documents different from the majority could eliminate the ordinary systems from data fusion and provide better discrimination among the documents and systems. This could improve the effectiveness of automatic ranking. Based on this intuition, we introduce a new method for the selection of systems to be used for data fusion. For this purpose, we use the bias concept that measures the deviation of a system from the norm or majority and employ the systems with higher bias in the data fusion process. This approach provides even higher correlations with the human-based results. We demonstrate that our approach outperforms the previously proposed automatic ranking methods.  相似文献   

The success of information retrieval depends on the ability to measure the effective relationship between a query and its response. If both are posed in natural language, one might expect that understanding the meaning of that language could not be avoided. The aim of this research is to demonstrate that it is perhaps unnecessary to be able to determine the meaning in the absolute sense; it may be sufficient to measure how far there is a conformity in meaning, and then only in the context of the set of documents in which the answer to a query is sought. Handling a particular language using a computer is made possible through replacing certain texts by special sets. A given text has a ‘syntactic trace’, the set of all the overlapping trigrams forming part of the text. When determining the effective relationship between a query and its answer, not only do their syntactic traces play a role, but so do the traces of all other documents in the set. This is known as the ‘information trace method’.  相似文献   

The fundamental idea of the work reported here is to extract index phrases from texts with the help of a single word concept dictionary and a thesaurus containing relations among concepts. The work is based on the fact, that, within every phrase, the single words the phrase is composed of are related in a certain well denned manner, the type of relations holding between concepts depending only on the concepts themselves. Therefore relations can be stored in a semantic network. The algorithm described extracts single word concepts from texts and combines them to phrases using the semantic relations between these concepts, which are stored in the network. The results obtained show that phrase extraction from texts by this semantic method is possible and offers many advantages over other (purely syntactic or statistic) methods concerning preciseness and completeness of the meaning representation of the text. But the results show, too, that some syntactic and morphologic “filtering” should be included for effectivity reasons.  相似文献   

This paper describes how the operations on the local inverted files are to be modified in order to use them in the distributed information retrieval system based on thesauri. The global system consists of n local retrieval systems. The presented retrieval rules may be viewed as the logical approach in implementing a physical distributed retrieval system.  相似文献   

The paper presents, firstly, a brief review of the long history of information ethics beginning with the Greek concept of parrhesia or freedom of speech as analyzed by Michel Foucault. The recent concept of information ethics is related particularly to problems which arose in the last century with the development of computer technology and the internet. A broader concept of information ethics as dealing with the digital reconstruction of all possible phenomena leads to questions relating to digital ontology. Following Heidegger’s conception of the relation between ontology and metaphysics, the author argues that ontology has to do with Being itself and not just with the Being of beings which is the matter of metaphysics. The primary aim of an ontological foundation of information ethics is to question the metaphysical ambitions of digital ontology understood as today’s pervading understanding of Being. The author analyzes some challenges of digital technology, particularly with regard to the moral status of digital agents. The author argues that information ethics does not only deal with ethical questions relating to the infosphere. This view is contrasted with arguments presented by Luciano Floridi on the foundation of information ethics as well as on the moral status of digital agents. It is argued that a reductionist view of the human body as digital data overlooks the limits of digital ontology and gives up one basis for ethical orientation. Finally issues related to the digital divide as well as to intercultural aspects of information ethics are explored – and long and short-term agendas for appropriate responses are presented.  相似文献   

Despite the importance of personalization in information retrieval, there is a big lack of standard datasets and methodologies for evaluating personalized information retrieval (PIR) systems, due to the costly process of producing such datasets. Subsequently, a group of evaluation frameworks (EFs) have been proposed that use surrogates of the PIR evaluation problem, instead of addressing it directly, to make PIR evaluation more feasible. We call this group of EFs, indirect evaluation frameworks. Indirect frameworks are designed to be more flexible than the classic (direct) ones and much cheaper to be employed. However, since there are many different settings and methods for PIR, e.g., social-network-based vs. profile-based PIR, and each needs some special kind of data to do the personalization based on, not all the evaluation frameworks are applicable to all the PIR methods. In this paper, we first review and categorize the frameworks that have already been introduced for evaluating PIR. We further propose a novel indirect EF based on citation networks (called PERSON), which allows repeatable, large-scale, and low-cost PIR experiments. It is also more information-rich compared to the existing EFs and can be employed in many different scenarios. The fundamental idea behind PERSON is that in each document (paper) d, the cited documents are generally related to d from the perspective of d’s author(s). To investigate the effectiveness of the proposed EF, we use a large collection of scientific papers. We conduct several sets of experiments and demonstrate that PERSON is a reliable and valid EF. In the experiments, we show that PERSON is consistent with the traditional Cranfield-based evaluation in comparing non-personalized IR methods. In addition, we show that PERSON can correctly capture the improvements made by personalization. We also demonstrate that its results are highly correlated with those of another salient EF. Our experiments on some issues about the validity of PERSON also show its validity. It is also shown that PERSON is robust w.r.t. its parameter settings.  相似文献   

This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these metrics using the Buckley/Voorhees stability method, the Voorhees/Buckley swap method and Kendall’s rank correlation, with three data sets comprising test collections and submitted runs from NTCIR. Our experiments show that (Average) Normalised Discounted Cumulative Gain at document cut-off l are the best among the rank-based graded-relevance metrics, provided that l is large. On the other hand, if one requires a recall-based graded-relevance metric that is highly correlated with Average Precision, then Q-measure is the best choice. Moreover, these best graded-relevance metrics are at least as stable and sensitive as Average Precision, and are fairly robust to the choice of gain values.  相似文献   

Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries––one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus.We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.  相似文献   

In this paper, we lay out a relational approach for indexing and retrieving photographs from a collection. The increase of digital image acquisition devices, combined with the growth of the World Wide Web, requires the development of information retrieval (IR) models and systems that provide fast access to images searched by users in databases. The aim of our work is to develop an IR model suited to images, integrating rich semantics for representing this visual data and user queries, which can also be applied to large corpora.  相似文献   

Currently a wide range of information system development methods exists, each method supporting different sets of concepts and being supported by different techniques and tools. For many large and complex systems a multi-method approach would be preferable but the fragmentation of contemporary methods does not cater at present for a unified representation of system specifications. The AMADEUS project attempts to redress this situation by enabling the integration of different contemporary development methods at the semantic level. This paper addresses the problems involved in the current multiplicity of development methods and proposes a means of enabling system developers to view information system development in a unified way.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号