期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance prediction of data fusion for information retrieval

Shengli Wu Sally McClean 《Information processing & management》2006

The data fusion technique has been investigated by many researchers and has been used in implementing several information retrieval systems. However, the results from data fusion vary in different situations. To find out under which condition data fusion may lead to performance improvement is an important issue. In this paper, we present an analysis of the behaviour of several well-known methods such as CombSum and CombMNZ for fusion of multiple information retrieval results. Based on this analysis, we predict the performance of the data fusion methods. Experiments are conducted with three groups of results submitted to TREC 6, TREC 2001, and TREC 2004. The experiments show that the prediction of the performance of data fusion is quite accurate, and it can be used in situations very different from the training examples. Compared with previous work, our result is more accurate and in a better position for applications since various number of component systems can be supported while only two was used previously. 相似文献

2.

Assigning appropriate weights for the linear combination data fusion method in information retrieval

Shengli Wu Yaxin Bi Xiaoqin Zeng Lixin Han 《Information processing & management》2009

In data fusion, the linear combination method is a very flexible method since different weights can be assigned to different systems. However, it remains an open question which weighting schema should be used. In some previous investigations and experiments, a simple weighting schema was used: for a system, its weight is assigned as its average performance over a group of training queries. However, it is not clear if this weighting schema is good or not. In some other investigations, different numerical optimisation methods were used to search for appropriate weights for the component systems. One major problem with those numerical optimisation methods is their low efficiency. It might not be feasible to use them in some situations, for example in some dynamic environments, system weights need to be updated from time to time for reasonably good performance. In this paper, we investigate the weighting issue by extensive experiments. The key point is to try to find the relation between performances of component systems and their corresponding weights which can lead to good fusion performance. We demonstrate that a series of power functions of average performance, which can be implemented as efficiently as the simple weighting schema, is more effective than the simple weighting schema for the linear data fusion method. Some other features of the power function weighting schema and the linear combination method are also investigated. The observations obtained from this study can be used directly in fusion applications of component retrieval results. The observations are also very useful for optimisation methods to choose better starting points and therefore to obtain more effective weights more quickly. 相似文献

3.

信息检索相关性及其发展策略研究

陈洁《情报探索》2020,(2):114-119

[目的/意义]旨在为信息检索相关性研究提供参考。[方法/过程]以CNKI为数据源,采用定性方法,从信息检索的历史脉络和研究学派进行梳理总结,分析信息检索的影响因素和发展趋势。[结果/结论]信息检索相关性是用户、系统的相关性的综合体,任何一方都不能脱离。相关性应该是以用户为关键,系统为基础,研究用户与检索系统的交互、认知以及真实需求的描述与反馈。随着信息检索相关性研究的深入,系统观与用户观将会相互交融,检索技术与用户需求将会协调统一,共同推进检索相关性的发展。相似文献

4.

Text retrieval with more realistic concept matching and reinforcement learning

Rohana K. Rajapakse Michael Denham 《Information processing & management》2006

This paper reports our experimental investigation into the use of more realistic concepts as opposed to simple keywords for document retrieval, and reinforcement learning for improving document representations to help the retrieval of useful documents for relevant queries. The framework used for achieving this was based on the theory of Formal Concept Analysis (FCA) and Lattice Theory. Features or concepts of each document (and query), formulated according to FCA, are represented in a separate concept lattice and are weighted separately with respect to the individual documents they present. The document retrieval process is viewed as a continuous conversation between queries and documents, during which documents are allowed to learn a set of significant concepts to help their retrieval. The learning strategy used was based on relevance feedback information that makes the similarity of relevant documents stronger and non-relevant documents weaker. Test results obtained on the Cranfield collection show a significant increase in average precisions as the system learns from experience. 相似文献

5.

Re-ranking method based on inter-document distances

《Information processing & management》2005,41(4):759-775

Lately there has been intensive research into the possibilities of using additional information about documents (such as hyperlinks) to improve retrieval effectiveness. It is called data fusion, based on the intuitive principle that different document and query representations or different methods lead to a better estimation of the documents' relevance scores.In this paper we propose a new method of document re-ranking that enables us to improve document scores using inter-document relationships. These relationships are expressed by distances and can be obtained from the text, hyperlinks or other information. The method formalizes the intuition that strongly related documents should not be assigned very different weights. 相似文献

6.

Choosing document structure weights

《Information processing & management》2005,41(2):243-264

Existing ranking schemes assume all term occurrences in a given document are of equal influence. Intuitively, terms occurring in some places should have a greater influence than those elsewhere. An occurrence in an abstract may be more important than an occurrence in the body text. Although this observation is not new, there remains the issue of finding good weights for each structure.Vector space, probability, and Okapi BM25 ranking are extended to include structure weighting. Weights are then selected for the TREC WSJ collection using a genetic algorithm. The learned weights are then tested on an evaluation set of queries. Structure weighted vector space inner product and structure weighted probabilistic retrieval show an about 5% improvement in mean average precision over their unstructured counterparts. Structure weighted BM25 shows nearly no improvement. Analysis suggests BM25 cannot be improved using structure weighting. 相似文献

7.

基于计划行为理论的高校图书馆移动信息服务质量评价

沈思王晓文崔旭《现代情报》2016,36(2):70

为研究移动信息服务质量这一图书馆以及用户共同关注的焦点问题,文章基于计划行为理论,构建移动信息服务质量评价指标体系,设计调查问卷对使用移动信息资源的用户进行指标重要度调查,利用加权统计方法对调查结果进行计算获得指标权重,通过模糊综合评价方法对西安科技大学图书馆移动信息服务质量进行评价。研究结果显示,该馆移动信息服务质量处于中等偏上水平,该图书馆还需要在服务手段、移动资源、检索系统、个性化服务等方面加强和创新,提高其移动信息服务水平,从而更好地满足用户的移动信息需求。相似文献

8.

Vocabulary mining for information retrieval: rough sets and fuzzy sets

《Information processing & management》2001,37(1):15-38

Vocabulary mining in information retrieval refers to the utilization of the domain vocabulary towards improving the user’s query. Most often queries posed to information retrieval systems are not optimal for retrieval purposes. Vocabulary mining allows one to generalize, specialize or perform other kinds of vocabulary-based transformations on the query in order to improve retrieval performance. This paper investigates a new framework for vocabulary mining that derives from the combination of rough sets and fuzzy sets. The framework allows one to use rough set-based approximations even when the documents and queries are described using weighted, i.e., fuzzy representations. The paper also explores the application of generalized rough sets and the variable precision models. The problem of coordination between multiple vocabulary views is also examined. Finally, a preliminary analysis of issues that arise when applying the proposed vocabulary mining framework to the Unified Medical Language System (a state-of-the-art vocabulary system) is presented. The proposed framework supports the systematic study and application of different vocabulary views in information retrieval. 相似文献

9.

Information-system structure by communication-technology concepts: A cybernetic model approach

Gerhard H. R. Reisig 《Information processing & management》1978,14(6):405-417

Information-systems are classified into two types, termed “Evidence-of Existence” and “Presentation” of information. The objective of the evidence-type system lies in the domain of documentation and retrieval of information. The structure of this system-type is developed, with application of cybernetic concepts, as an isomorphic model in analogy to the system-structure of communication technology. The latter postulates three criteria of structuring: (1) Source-Channel-Sink, with input-output characteristics, (2) Filter-type communication-channel, (3) Reversable code. These criteria are applied to the structuring of information-systems of the evidence-of-existence type. For the purpose of two-way communication the information-systems have to be represented by closed-loop models. The selective-retrieval requirements necessitate the system-channel to be a filter of information. These information-filters are implemented by keyword-phrases, being identical with the codewords. They yield a uniquely decodable code which is totally reversible to adequately serve both the documentation and the retrieval of documents. It is proven that hierarchic information-systems, applying categorization or subject-heading objects of information, do not meet the mandatory code-requirements. The inherent coding-deficiencies of hierarchic systems generate intolerable retrieval ambiguities. The same critique applies to the thesaurus concept. The development of a novel species of thesaurus is suggested, realizing a kind of Linnéan encyclopedia of general human knowledge, presenting all relevant interrelations of objects of knowledge. Such thesaurus would provide the much needed support for formulating efficient search queries. Other relevant features of communication technology, like the information-potential, should be isomorphically transformed into information-system models. 相似文献

10.

On the role of user-centred evaluation in the advancement of interactive information retrieval 总被引：1，自引：0，他引：1

Daniela Petrelli 《Information processing & management》2008

This paper discusses the role of user-centred evaluations as an essential method for researching interactive information retrieval. It draws mainly on the work carried out during the Clarity Project where different user-centred evaluations were run during the lifecycle of a cross-language information retrieval system. The iterative testing was not only instrumental to the development of a usable system, but it enhanced our knowledge of the potential, impact, and actual use of cross-language information retrieval technology. Indeed the role of the user evaluation was dual: by testing a specific prototype it was possible to gain a micro-view and assess the effectiveness of each component of the complex system; by cumulating the result of all the evaluations (in total 43 people were involved) it was possible to build a macro-view of how cross-language retrieval would impact on users and their tasks. By showing the richness of results that can be acquired, this paper aims at stimulating researchers into considering user-centred evaluations as a flexible, adaptable and comprehensive technique for investigating non-traditional information access systems. 相似文献

11.

QCS: A system for querying, clustering and summarizing documents

Daniel M. Dianne P. John M. Judith D. 《Information processing & management》2007,43(6):1588

Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system—the Query, Cluster, Summarize (QCS) system—which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic.We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines.Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence “trimming” and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format.Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules. 相似文献

12.

Requirements for query evaluation in weighted information retrieval

Martin Bärtschi 《Information processing & management》1985,21(4):291-303

相似文献

13.

Re-examining the effects of adding relevance information in a relevance feedback environment

W.S. Wong R.W.P. Luk H.V. Leong K.S. Ho D.L. Lee 《Information processing & management》2008

This paper presents an investigation about how to automatically formulate effective queries using full or partial relevance information (i.e., the terms that are in relevant documents) in the context of relevance feedback (RF). The effects of adding relevance information in the RF environment are studied via controlled experiments. The conditions of these controlled experiments are formalized into a set of assumptions that form the framework of our study. This framework is called idealized relevance feedback (IRF) framework. In our IRF settings, we confirm the previous findings of relevance feedback studies. In addition, our experiments show that better retrieval effectiveness can be obtained when (i) we normalize the term weights by their ranks, (ii) we select weighted terms in the top K retrieved documents, (iii) we include terms in the initial title queries, and (iv) we use the best query sizes for each topic instead of the average best query size where they produce at most five percentage points improvement in the mean average precision (MAP) value. We have also achieved a new level of retrieval effectiveness which is about 55–60% MAP instead of 40+% in the previous findings. This new level of retrieval effectiveness was found to be similar to a level using a TREC ad hoc test collection that is about double the number of documents in the TREC-3 test collection used in previous works. 相似文献

14.

Automatic ranking of information retrieval systems using data fusion

Rabia Nuray Fazli Can 《Information processing & management》2006

Measuring effectiveness of information retrieval (IR) systems is essential for research and development and for monitoring search quality in dynamic environments. In this study, we employ new methods for automatic ranking of retrieval systems. In these methods, we merge the retrieval results of multiple systems using various data fusion algorithms, use the top-ranked documents in the merged result as the “(pseudo) relevant documents,” and employ these documents to evaluate and rank the systems. Experiments using Text REtrieval Conference (TREC) data provide statistically significant strong correlations with human-based assessments of the same systems. We hypothesize that the selection of systems that would return documents different from the majority could eliminate the ordinary systems from data fusion and provide better discrimination among the documents and systems. This could improve the effectiveness of automatic ranking. Based on this intuition, we introduce a new method for the selection of systems to be used for data fusion. For this purpose, we use the bias concept that measures the deviation of a system from the norm or majority and employ the systems with higher bias in the data fusion process. This approach provides even higher correlations with the human-based results. We demonstrate that our approach outperforms the previously proposed automatic ranking methods. 相似文献

15.

Estimating Gaussian mixture models in the local neighbourhood of embedded word vectors for query performance prediction

《Information processing & management》2019,56(3):1026-1045

The study of query performance prediction (QPP) in information retrieval (IR) aims to predict retrieval effectiveness. The specificity of the underlying information need of a query often determines how effectively can a search engine retrieve relevant documents at top ranks. The presence of ambiguous terms makes a query less specific to the sought information need, which in turn may degrade IR effectiveness. In this paper, we propose a novel word embedding based pre-retrieval feature which measures the ambiguity of each query term by estimating how many ‘senses’ each word is associated with. Assuming each sense roughly corresponds to a Gaussian mixture component, our proposed generative model first estimates a Gaussian mixture model (GMM) from the word vectors that are most similar to the given query terms. We then use the posterior probabilities of generating the query terms themselves from this estimated GMM in order to quantify the ambiguity of the query. Previous studies have shown that post-retrieval QPP approaches often outperform pre-retrieval ones because they use additional information from the top ranked documents. To achieve the best of both worlds, we formalize a linear combination of our proposed GMM based pre-retrieval predictor with NQC, a state-of-the-art post-retrieval QPP. Our experiments on the TREC benchmark news and web collections demonstrate that our proposed hybrid QPP approach (in linear combination with NQC) significantly outperforms a range of other existing pre-retrieval approaches in combination with NQC used as baselines. 相似文献

16.

图书馆数字资源SOA平台的构建

杜治波曹鹏《现代情报》2012,32(5):58-61

目前高校图书馆数字资源存在重复建设严重,信息孤岛现象突出等问题,通常的解决方案是采用异构检索技术,但传统异构检索技术存在检索结果难以融合处理,无法解决同类型资源整合的重大不足,本文特提出一种图书馆数字资源SOA平台的设计,基于Web Services实现一站式检索服务,可以有效解决异构检索结果融合处理的问题,从而促进资源的利用,提高图书馆信息呈现水平。相似文献

17.

Multitasking during Web search sessions

Amanda Spink Minsoo Park Bernard J. Jansen Jan Pedersen 《Information processing & management》2006

A user’s single session with a Web search engine or information retrieval (IR) system may consist of seeking information on single or multiple topics, and switch between tasks or multitasking information behavior. Most Web search sessions consist of two queries of approximately two words. However, some Web search sessions consist of three or more queries. We present findings from two studies. First, a study of two-query search sessions on the AltaVista Web search engine, and second, a study of three or more query search sessions on the AltaVista Web search engine. We examine the degree of multitasking search and information task switching during these two sets of AltaVista Web search sessions. A sample of two-query and three or more query sessions were filtered from AltaVista transaction logs from 2002 and qualitatively analyzed. Sessions ranged in duration from less than a minute to a few hours. Findings include: (1) 81% of two-query sessions included multiple topics, (2) 91.3% of three or more query sessions included multiple topics, (3) there are a broad variety of topics in multitasking search sessions, and (4) three or more query sessions sometimes contained frequent topic changes. Multitasking is found to be a growing element in Web searching. This paper proposes an approach to interactive information retrieval (IR) contextually within a multitasking framework. The implications of our findings for Web design and further research are discussed. 相似文献

18.

Topic distillation via sub-site retrieval

Tao Qin Tie-Yan Liu Xu-Dong Zhang Guang Feng De-Sheng Wang Wei-Ying Ma 《Information processing & management》2007

Topic distillation is one of the main information needs when users search the Web. Previous approaches for topic distillation treat single page as the basic searching unit, which has not fully utilized the structure information of the Web. In this paper, we propose a novel concept for topic distillation, named sub-site retrieval, in which the basic searching unit is sub-site instead of single page. A sub-site is the subset of a website, consisting of a structural collection of pages. The key of sub-site retrieval includes (1) extracting effective features for the representation of a sub-site using both the content and structure information, (2) delivering the sub-site-based retrieval results with a friendly and informative user interface. For the first point, we propose Punished Integration algorithm, which is based on the modeling of the growth of websites. For the second point, we design a user interface to better illustrate the search results of sub-site retrieval. Testing on the topic distillation task of TREC 2003 and 2004, sub-site retrieval leads to significant improvement of retrieval performance over the previous methods based on single pages. Furthermore, time complexity analysis shows that sub-site retrieval can be integrated into the index component of search engines. 相似文献

19.

Popular and/or prestigious? Measures of scholarly esteem 总被引：1，自引：0，他引：1

Ying Ding Blaise Cronin 《Information processing & management》2011

Citation analysis does not generally take the quality of citations into account: all citations are weighted equally irrespective of source. However, a scholar may be highly cited but not highly regarded: popularity and prestige are not identical measures of esteem. In this study we define popularity as the number of times an author is cited and prestige as the number of times an author is cited by highly cited papers. Information retrieval (IR) is the test field. We compare the 40 leading researchers in terms of their popularity and prestige over time. Some authors are ranked high on prestige but not on popularity, while others are ranked high on popularity but not on prestige. We also relate measures of popularity and prestige to date of Ph.D. award, number of key publications, organizational affiliation, receipt of prizes/honors, and gender. 相似文献

20.

Passage feedback with IRIS

《Information processing & management》2001,37(3):521-541

We compare a user-defined passage feedback (pf) system to a document feedback (df) system. Df employed the adaptive linear model for retrieval, while pf used weighted query expansion based on positive and negative feedback. Twenty-four searchers performed the same six tasks in varying search and system-order per TREC-8 guidelines. We hypothesized that pf, which featured interactive query expansion, would outperform df, which relied on automatic query expansion. Initial analysis appeared to reject this hypothesis, as df showed slightly higher overall performance than pf. However, analysis by system-order groups indicates only the first pf use had lower performance. These data suggest that pf was more difficult to learn than df, though the second pf use yielded competitive performance. If performance of pf is indeed affected by learning, an improved pf system with usability enhancements may prove to be an effective mechanism for interactive information retrieval. 相似文献