期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Relation of resemblance in information retrieval

Pirkko Pietiläinen 《Information processing & management》1982,18(2):55-59

A method using the amount of semantic information of query terms as weight in a fuzzy relation of resemblance is presented. The relation can be used to partially order documents in decreasing order of resemblance with the query. Large operational bibliographic data bases are used to test the validity of the approach. 相似文献

2.

Artificial intelligence in information retrieval systems

Linda C. Smith 《Information processing & management》1976,12(3):189-222

A survey is given of the potential role of artificial intelligence in retrieval systems. Papers by Bush and Turing are used to introduce early ideas in the two fields and definitions for artificial intelligence and information retrieval for the purposes of this paper are given. A simple model of an information retrieval system provides a framework for subsequent discussion of artificial intelligence concepts and their applicability in information retrieval. Concepts surveyed include pattern recognition, representation, problem solving and planning, heuristics, and learning. The paper concludes with an outline of areas for further research on artificial intelligence in information retrieval systems. 相似文献

3.

Library and information retrieval software

D. Ellis 《International Journal of Information Management》1986,6(4)

相似文献

4.

Verbosity normalized pseudo-relevance feedback in information retrieval

Seung-Hoon Na Kangil Kim 《Information processing & management》2018,54(2):219-239

Document length normalization is one of the fundamental components in a retrieval model because term frequencies can readily be increased in long documents. The key hypotheses in literature regarding document length normalization are the verbosity and scope hypotheses, which imply that document length normalization should consider the distinguishing effects of verbosity and scope on term frequencies. In this article, we extend these hypotheses in a pseudo-relevance feedback setting by assuming the verbosity hypothesis on the feedback query model, which states that the verbosity of an expanded query should not be high. Furthermore, we postulate the following two effects of document verbosity on a feedback query model that easily and typically holds in modern pseudo-relevance feedback methods: 1) the verbosity-preserving effect: the query verbosity of a feedback query model is determined by feedback document verbosities; 2) the verbosity-sensitive effect: highly verbose documents more significantly and unfairly affect the resulting query model than normal documents do. By considering these effects, we propose verbosity normalized pseudo-relevance feedback, which is straightforwardly obtained by replacing original term frequencies with their verbosity-normalized term frequencies in the pseudo-relevance feedback method. The results of the experiments performed on three standard TREC collections show that the proposed verbosity normalized pseudo-relevance feedback consistently provides statistically significant improvements over conventional methods, under the settings of the relevance model and latent concept expansion. 相似文献

5.

Phrase structure rewrite systems in information retrieval

Paul H. Klingbiel 《Information processing & management》1985,21(2):113-126

Operational level automatic indexing requires an efficient means of normalizing natural language phrases. Subject switching requires an efficient means of translating one set of authorized terms to another. A phrase structure rewrite system called a Lexical Dictionary is explained that performs these functions. Background, operational use, other applications and ongoing research are explained. 相似文献

6.

Term relevance weights in on-line information retrieval

G. Salton R.K. Waldstein 《Information processing & management》1978,14(1):29-35

Considerable evidence exists to show that the use of term relevance weights is beneficial in interactive information retrieval. Various term weighting systems are reviewed. An experiment is then described in which information retrieval users are asked to rank query terms in decreasing order of presumed importance prior to actual search and retrieval. The experimental design is examined, and various relevance ranking systems are evaluated, including fully automatic systems based on inverse document frequency parameters, human rankings performed by the user population, and combinations of the two. 相似文献

7.

Automatic query formulations in information retrieval.

G Salton C Buckley E A Fox 《Journal of the American Society for Information Science》1983,34(4):262-280

Modern information retrieval systems are designed to supply relevant information in response to requests received from the user population. In most retrieval environments the search requests consist of keywords, or index terms, interrelated by appropriate Boolean operators. Since it is difficult for untrained users to generate effective Boolean search requests, trained search intermediaries are normally used to translate original statements of user need into useful Boolean search formulations. Methods are introduced in this study which reduce the role of the search intermediaries by making it possible to generate Boolean search formulations completely automatically from natural language statements provided by the system patrons. Frequency considerations are used automatically to generate appropriate term combinations as well as Boolean connectives relating the terms. Methods are covered to produce automatic query formulations both in a standard Boolean logic system, as well as in an extended Boolean system in which the strict interpretation of the connectives is relaxed. Experimental results are supplied to evaluate the effectiveness of the automatic query formulation process, and methods are described for applying the automatic query formulation process in practice. 相似文献

8.

A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase

Georgios Paltoglou Michail Salampasis Maria Satratzemi 《Information processing & management》2008

The problem of results merging in distributed information retrieval environments has gained significant attention the last years. Two generic approaches have been introduced in research. The first approach aims at estimating the relevance of the documents returned from the remote collections through ad hoc methodologies (such as weighted score merging, regression etc.) while the other is based on downloading all the documents locally, completely or partially, in order to calculate their relevance. Both approaches have advantages and disadvantages. Download methodologies are more effective but they pose a significant overhead on the process in terms of time and bandwidth. Approaches that rely solely on estimation on the other hand, usually depend on document relevance scores being reported by the remote collections in order to achieve maximum performance. In addition to that, regression algorithms, which have proved to be more effective than weighted scores merging algorithms, need a significant number of overlap documents in order to function effectively, practically requiring multiple interactions with the remote collections. The new algorithm that is introduced is based on adaptively downloading a limited, selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies. Thus it reconciles the above two approaches, combining their strengths, while minimizing their drawbacks, achieving the limited time and bandwidth overhead of the estimation approaches and the increased effectiveness of the download. The proposed algorithm is tested in a variety of settings and its performance is found to be significantly better than the former, while approximating that of the latter. 相似文献

9.

The weighted Condorcet fusion in information retrieval

Shengli Wu 《Information processing & management》2013

The Condorcet fusion is a distinctive fusion method and was found useful in information retrieval. Two basic requirements for the Condorcet fusion to improve retrieval effectiveness are: (1) all component systems involved should be more or less equally effective; and (2) each information retrieval system should be developed independently and thus each component result is more or less equally different from the others. These two requirements may not be satisfied in many cases, then weighted Condorcet becomes a good option. However, how to assign weights for the weighted Condorcet has not been investigated. 相似文献

10.

Ontology refinement for improved information retrieval

Antonio Jimeno-Yepes Rafael Berlanga-Llavori Dietrich Rebholz-Schuhmann 《Information processing & management》2010

Ontologies are frequently used in information retrieval being their main applications the expansion of queries, semantic indexing of documents and the organization of search results. Ontologies provide lexical items, allow conceptual normalization and provide different types of relations. However, the optimization of an ontology to perform information retrieval tasks is still unclear. In this paper, we use an ontology query model to analyze the usefulness of ontologies in effectively performing document searches. Moreover, we propose an algorithm to refine ontologies for information retrieval tasks with preliminary positive results. 相似文献

11.

Parsimonious translation models for information retrieval

Seung-Hoon Na In-Su KangJong-Hyeok Lee 《Information processing & management》2007

In the KL divergence framework, the extended language modeling approach has a critical problem of estimating a query model, which is the probabilistic model that encodes the user’s information need. For query expansion in initial retrieval, the translation model had been proposed to involve term co-occurrence statistics. However, the translation model was difficult to apply, because the term co-occurrence statistics must be constructed in the offline time. Especially in a large collection, constructing such a large matrix of term co-occurrences statistics prohibitively increases time and space complexity. In addition, reliable retrieval performance cannot be guaranteed because the translation model may comprise noisy non-topical terms in documents. To resolve these problems, this paper investigates an effective method to construct co-occurrence statistics and eliminate noisy terms by employing a parsimonious translation model. The parsimonious translation model is a compact version of a translation model that can reduce the number of terms containing non-zero probabilities by eliminating non-topical terms in documents. Through experimentation on seven different test collections, we show that the query model estimated from the parsimonious translation model significantly outperforms not only the baseline language modeling, but also the non-parsimonious models. 相似文献

12.

Query-level loss functions for information retrieval

Tao Qin Xu-Dong Zhang Ming-Feng Tsai De-Sheng Wang Tie-Yan Liu Hang Li 《Information processing & management》2008

Many machine learning technologies such as support vector machines, boosting, and neural networks have been applied to the ranking problem in information retrieval. However, since originally the methods were not developed for this task, their loss functions do not directly link to the criteria used in the evaluation of ranking. Specifically, the loss functions are defined on the level of documents or document pairs, in contrast to the fact that the evaluation criteria are defined on the level of queries. Therefore, minimizing the loss functions does not necessarily imply enhancing ranking performances. To solve this problem, we propose using query-level loss functions in learning of ranking functions. We discuss the basic properties that a query-level loss function should have and propose a query-level loss function based on the cosine similarity between a ranking list and the corresponding ground truth. We further design a coordinate descent algorithm, referred to as RankCosine, which utilizes the proposed loss function to create a generalized additive ranking model. We also discuss whether the loss functions of existing ranking algorithms can be extended to query-level. Experimental results on the datasets of TREC web track, OHSUMED, and a commercial web search engine show that with the use of the proposed query-level loss function we can significantly improve ranking accuracies. Furthermore, we found that it is difficult to extend the document-level loss functions to query-level loss functions. 相似文献

13.

Relevance feedback and cross-language information retrieval

Viviane Moreira Orengo Christian Huyck 《Information processing & management》2006

This paper presents a study of relevance feedback in a cross-language information retrieval environment. We have performed an experiment in which Portuguese speakers are asked to judge the relevance of English documents; documents hand-translated to Portuguese and documents automatically translated to Portuguese. The goals of the experiment were to answer two questions (i) how well can native Portuguese searchers recognise relevant documents written in English, compared to documents that are hand translated and automatically translated to Portuguese; and (ii) what is the impact of misjudged documents on the performance improvement that can be achieved by relevance feedback. Surprisingly, the results show that machine translation is as effective as hand translation in aiding users to assess relevance in the experiment. In addition, the impact of misjudged documents on the performance of RF is overall just moderate, and varies greatly for different query topics. 相似文献

14.

Requirements for query evaluation in weighted information retrieval

Martin Bärtschi 《Information processing & management》1985,21(4):291-303

相似文献

15.

Testing the cluster hypothesis in distributed information retrieval

Fabio Crestani Shengli Wu 《Information processing & management》2006

How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users. 相似文献

16.

高校图书馆信息检索教学的改进策略

赵艳《中国科技信息》2013,(7):186-186

随着信息技术的快速发展,网络资源已经成为人们获取信息的最主要渠道,特别是在高校教学中,如何提高大学生科学获取信息的能力,是提高高校教育教学质量的重要的组成部分,本文结合当今高校图书馆在文献检索教学中存在的一些问题及改进的策略进行了分析,并提出了一些改进的措施。相似文献

17.

Multilevel information system— Towards more flexible information retrieval systems

Henryk Rybiński Bolesław Szymański 《Information processing & management》1981,17(5):277-290

The Multilevel Information System (MLIS), extension of typical Information Retrieval System towards more complete data processing, is discussed. MLIS integrates functions typical for data base management systems and retrieval-oriented systems. Several levels of data accessing are provided, each level developed for a different class of users. End-user level is based on simple query language, trained user level on a relational model, and application programmer level on a Data Manipulation Language nested in high level programming language. The last two levels are discussed in detail. 相似文献

18.

An agenda for green information retrieval research

Gobinda Chowdhury 《Information processing & management》2012

Nowadays we use information retrieval systems and services as part of our many day-to-day activities ranging from a web and database search to searching for various digital libraries, audio and video collections/services, and so on. However, IR systems and services make extensive use of ICT (information and communication technologies) and increasing use of ICT can significantly increase greenhouse gas (GHG, a term used to denote emission of harmful gases in the atmosphere) emissions. Sustainable development, and more importantly environmental sustainability, has become a major area of concern of various national and international bodies and as a result various initiatives and measures are being proposed for reducing the environmental impact of industries, businesses, governments and institutions. Research also shows that appropriate use of ICT can reduce the overall GHG emissions of a business, product or service. Green IT and cloud computing can play a key role in reducing the environmental impact of ICT. This paper proposes the concept of Green IR systems and services that can play a key role in reducing the overall environmental impact of various ICT-based services in education and research, business, government, etc., that are increasingly being reliant on access and use of digital information. However, to date there has not been any systematic research towards building Green IR systems and services. This paper points out the major challenges in building Green IR systems and services, and two different methods are proposed for estimating the energy consumption, and the corresponding GHG emissions, of an IR system or service. This paper also proposes the four key enablers of a Green IR viz. Standardize, Share, Reuse and Green behavior. Further research required to achieve these for building Green IR systems and services are also mentioned. 相似文献

19.

Adapting information retrieval systems to user queries

Giridhar Kumaran James Allan 《Information processing & management》2008,44(6):1838

Users enter queries that are short as well as long. The aim of this work is to evaluate techniques that can enable information retrieval (IR) systems to automatically adapt to perform better on such queries. By adaptation we refer to (1) modifications to the queries via user interaction, and (2) detecting that the original query is not a good candidate for modification. We show that the former has the potential to improve mean average precision (MAP) of long and short queries by 40% and 30% respectively, and that simple user interaction can help towards this goal. We observed that after inspecting the options presented to them, users frequently did not select any. We present techniques in this paper to determine beforehand the utility of user interaction to avoid this waste of time and effort. We show that our techniques can provide IR systems with the ability to detect and avoid interaction for unpromising queries without a significant drop in overall performance. 相似文献

20.

A risk minimization framework for information retrieval

ChengXiang Zhai John Lafferty 《Information processing & management》2006

This paper presents a probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models, user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate systematic development of new retrieval models. As an example of using the framework to model non-traditional retrieval problems, we derive retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they relax the traditional assumption of independent relevance of documents. 相似文献