共查询到20条相似文献,搜索用时 0 毫秒
1.
On information retrieval metrics designed for evaluation with incomplete relevance assessments 总被引:1,自引:0,他引:1
Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments
has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, where many documents exist that were never examined by the relevance assessors, is receiving a lot of attention.
This article compares the robustness of IR metrics to incomplete relevance assessments, using four different sets of graded-relevance
test collections with submitted runs—the TREC 2003 and 2004 robust track data and the NTCIR-6 Japanese and Chinese IR data
from the crosslingual task. Following previous work, we artificially reduce the original relevance data to simulate IR evaluation
environments with extremely incomplete relevance data. We then investigate the effect of this reduction on discriminative power, which we define as the proportion of system pairs with a statistically significant difference for a given probability of
Type I Error, and on Kendall’s rank correlation, which reflects the overall resemblance of two system rankings according to two different metrics or two different relevance
data sets. According to these experiments, Q′, nDCG′ and AP′ proposed by Sakai are superior to bpref proposed by Buckley and
Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also point out some weaknesses of bpref and Rank-Biased
Precision by examining their formal definitions.
相似文献
Noriko KandoEmail: |
2.
Recently direct optimization of information retrieval (IR) measures has become a new trend in learning to rank. In this paper,
we propose a general framework for direct optimization of IR measures, which enjoys several theoretical advantages. The general
framework, which can be used to optimize most IR measures, addresses the task by approximating the IR measures and optimizing
the approximated surrogate functions. Theoretical analysis shows that a high approximation accuracy can be achieved by the
framework. We take average precision (AP) and normalized discounted cumulated gains (NDCG) as examples to demonstrate how
to realize the proposed framework. Experiments on benchmark datasets show that the algorithms deduced from our framework are
very effective when compared to existing methods. The empirical results also agree well with the theoretical results obtained
in the paper. 相似文献
3.
Direct optimization of evaluation measures has become an important branch of learning to rank for information retrieval (IR).
Since IR evaluation measures are difficult to optimize due to their non-continuity and non-differentiability, most direct
optimization methods optimize some surrogate functions instead, which we call surrogate measures. A critical issue regarding
these methods is whether the optimization of the surrogate measures can really lead to the optimization of the original IR
evaluation measures. In this work, we perform formal analysis on this issue. We propose a concept named “tendency correlation”
to describe the relationship between a surrogate measure and its corresponding IR evaluation measure. We show that when a
surrogate measure has arbitrarily strong tendency correlation with an IR evaluation measure, the optimization of it will lead
to the effective optimization of the original IR evaluation measure. Then, we analyze the tendency correlations of the surrogate
measures optimized in a number of direct optimization methods. We prove that the surrogate measures in SoftRank and ApproxRank
can have arbitrarily strong tendency correlation with the original IR evaluation measures, regardless of the data distribution,
when some parameters are appropriately set. However, the surrogate measures in SVM
MAP
, DORM
NDCG
, PermuRank
MAP
, and SVM
NDCG
cannot have arbitrarily strong tendency correlation with the original IR evaluation measures on certain distributions of
data. Therefore SoftRank and ApproxRank are theoretically sounder than SVM
MAP
, DORM
NDCG
, PermuRank
MAP
, and SVM
NDCG
, and are expected to result in better ranking performances. Our theoretical findings can explain the experimental results
observed on public benchmark datasets. 相似文献
4.
《Microprocessing and Microprogramming》1983,11(2):127-132
In modern information processing technology there is a significant tendency to connect microfilm and Computer Science Techniques to each other. The purpose of it is to automatize information retrieval systems. Such an automatized system is shown here. It consists of a central computer based on a microprocessor with an external storage disk, a microfilm reader, a CRT terminal and the corresponding interfaces. The data structure handled by the system consists of a societies file and a documents file. The societies file has a hash organization and the documents file is structured as a linked stack. 相似文献
5.
信息检索课在高等院校信息素质教育中发挥着重要的作用,提高信息素质教育水平也是该课程的教学目标。从构建新的教学目标、合理调整教学内容、整合多种现代教学方法、加强与学科专业课程的结合、建立有效的评价体系、提高教师队伍的综合素质6个方面对面向信息素质教育的信息检索课教学改革进行了探讨。 相似文献
6.
7.
介绍了首都医科大学的在线考试系统,比较分析了学生的考试成绩,指出了在线考试系统的优点及需改进的问题。 相似文献
8.
网络环境下的信息检索教学设计 总被引:4,自引:0,他引:4
本文在分析网络信息检索教学现状的基础上,探讨了如何在新形势下进一步进行网络信息检索教学改革。主要在教学内容、教学方法及师资建设上进行新的尝试,来提高信息检索教学质量,以适应信息时代发展的需要。 相似文献
9.
A standard approach to Information Retrieval (IR) is to model text as a bag of words. Alternatively, text can be modelled
as a graph, whose vertices represent words, and whose edges represent relations between the words, defined on the basis of
any meaningful statistical or linguistic relation. Given such a text graph, graph theoretic computations can be applied to measure various properties of the graph, and hence of the text. This work
explores the usefulness of such graph-based text representations for IR. Specifically, we propose a principled graph-theoretic
approach of (1) computing term weights and (2) integrating discourse aspects into retrieval. Given a text graph, whose vertices
denote terms linked by co-occurrence and grammatical modification, we use graph ranking computations (e.g. PageRank Page et al.
in The pagerank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project,
1998) to derive weights for each vertex, i.e. term weights, which we use to rank documents against queries. We reason that our
graph-based term weights do not necessarily need to be normalised by document length (unlike existing term weights) because
they are already scaled by their graph-ranking computation. This is a departure from existing IR ranking functions, and we
experimentally show that it performs comparably to a tuned ranking baseline, such as BM25 (Robertson et al. in NIST Special
Publication 500-236: TREC-4, 1995). In addition, we integrate into ranking graph properties, such as the average path length, or clustering coefficient, which
represent different aspects of the topology of the graph, and by extension of the document represented as a graph. Integrating
such properties into ranking allows us to consider issues such as discourse coherence, flow and density during retrieval.
We experimentally show that this type of ranking performs comparably to BM25, and can even outperform it, across different
TREC (Voorhees and Harman in TREC: Experiment and evaluation in information retrieval, MIT Press, 2005) datasets and evaluation measures. 相似文献
10.
Over the last three decades, research in Information Retrieval (IR) shows performance improvement when many sources of evidence
are combined to produce a ranking of documents. Most current approaches assess document relevance by computing a single score
which aggregates values of some attributes or criteria. They use analytic aggregation operators which either lead to a loss
of valuable information, e.g., the min or lexicographic operators, or allow very bad scores on some criteria to be compensated
with good ones, e.g., the weighted sum operator. Moreover, all these approaches do not handle imprecision of criterion scores.
In this paper, we propose a multiple criteria framework using a new aggregation mechanism based on decision rules identifying
positive and negative reasons for judging whether a document should get a better ranking than another. The resulting procedure
also handles imprecision in criteria design. Experimental results are reported showing that the suggested method performs
better than standard aggregation operators. 相似文献
11.
12.
Qiang Wu Christopher J. C. Burges Krysta M. Svore Jianfeng Gao 《Information Retrieval》2010,13(3):254-270
We present a new ranking algorithm that combines the strengths of two previous methods: boosted tree classification, and LambdaRank,
which has been shown to be empirically optimal for a widely used information retrieval measure. Our algorithm is based on
boosted regression trees, although the ideas apply to any weak learners, and it is significantly faster in both train and
test phases than the state of the art, for comparable accuracy. We also show how to find the optimal linear combination for
any two rankers, and we use this method to solve the line search problem exactly during boosting. In addition, we show that
starting with a previously trained model, and boosting using its residuals, furnishes an effective technique for model adaptation,
and we give significantly improved results for a particularly pressing problem in web search—training rankers for markets
for which only small amounts of labeled data are available, given a ranker trained on much more data from a larger market. 相似文献
13.
信息检索中"相关性"的探究 总被引:3,自引:0,他引:3
本文从“相关性”的动态、多维的内涵出发,探讨了“相关性”的影响因素,即信息源、检索系统、用户、时间与环境,最后得出了“相关性”对建立信息检索系统的一些启示。 相似文献
14.
网络信息资源及其检索 总被引:6,自引:0,他引:6
网络信息资源及其检索●许云文(佛山图书馆)Internet是世界上规模最大,用户最多、影响最大的网络互联系统,它已经覆盖了全球154个国家和地区,联入4.8万多个计算机网,近400万台主机,拥有4000多万个终端用户,预计到2000年,网络用户量将超... 相似文献
15.
16.
There have been a number of linear, feature-based models proposed by the information retrieval community recently. Although
each model is presented differently, they all share a common underlying framework. In this paper, we explore and discuss the
theoretical issues of this framework, including a novel look at the parameter space. We then detail supervised training algorithms
that directly maximize the evaluation metric under consideration, such as mean average precision. We present results that
show training models in this way can lead to significantly better test set performance compared to other training methods
that do not directly maximize the metric. Finally, we show that linear feature-based models can consistently and significantly
outperform current state of the art retrieval models with the correct choice of features.
相似文献
相似文献
17.
Teaching and learning in information retrieval 总被引:1,自引:1,他引:0
Juan M. Fernández-Luna Juan F. Huete Andrew MacFarlane Efthimis N. Efthimiadis 《Information Retrieval》2009,12(2):201-226
A literature review of pedagogical methods for teaching and learning information retrieval is presented. From the analysis
of the literature a taxonomy was built and it is used to structure the paper. Information Retrieval (IR) is presented from
different points of view: technical levels, educational goals, teaching and learning methods, assessment and curricula. The
review is organized around two levels of abstraction which form a taxonomy that deals with the different aspects of pedagogy
as applied to information retrieval. The first level looks at the technical level of delivering information retrieval concepts,
and at the educational goals as articulated by the two main subject domains where IR is delivered: computer science (CS) and
library and information science (LIS). The second level focuses on pedagogical issues, such as teaching and learning methods,
delivery modes (classroom, online or e-learning), use of IR systems for teaching, assessment and feedback, and curricula design.
The survey, and its bibliography, provides an overview of the pedagogical research carried out in the field of IR. It also
provides a guide for educators on approaches that can be applied to improving the student learning experiences. 相似文献
18.
《Library & information science research》2006,28(2):192-207
The application of visualization techniques to information retrieval (IR) has resulted in the development of innovative systems and interfaces that are now available for public use. Visualization tools have emerged in research environments and more recently on the Web to retrieve information. Questions arise in regard to the utility of Web-based IR visualization tools for assisting users not only in manipulating search output, but also in managing the information retrieval process. To understand how Web-based visualization tools enable visual information retrieval, this article reviews some of the human perceptual theory behind the graphical interface of information visualization systems, analyzes iconic representations and information density on visualization displays, and examines information retrieval tasks that have been used in visualization system user research. This article is timely since it addresses new technologies for Web information retrieval and discusses future information visualization user research directions. 相似文献
19.
20.
We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the
theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary
mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering
the component distributions, individually as well as in pairs, under some limiting conditions of parameter values. From all
the mixtures suggested in the past, the current theoretical argument points to the two gamma as the most-likely universal
model, with the normal-exponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental
evidence showing vector space or geometric models, and BM25, as being ‘friendly’ to the normal-exponential, and that the non-convexity
problem that the mixture possesses is practically not severe. Furthermore, we review recent non-binary mixture models, speculate
on graded relevance, and consider methods such as logistic regression for score calibration. 相似文献