共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Shiva Imani Moghadasi Sri Devi Ravana Sudharshan N. Raman 《Journal of Informetrics》2013,7(2):301-312
For a system-based information retrieval evaluation, test collection model still remains as a costly task. Producing relevance judgments is an expensive, time consuming task which has to be performed by human assessors. It is not viable to assess the relevancy of every single document in a corpus against each topic for a large collection. In an experimental-based environment, partial judgment on the basis of a pooling method is created to substitute a complete assessment of documents for relevancy. Due to the increasing number of documents, topics, and retrieval systems, the need to perform low-cost evaluations while obtaining reliable results is essential. Researchers are seeking techniques to reduce the costs of experimental IR evaluation process by the means of reducing the number of relevance judgments to be performed or even eliminating them while still obtaining reliable results. In this paper, various state-of-the-art approaches in performing low-cost retrieval evaluation are discussed under each of the following categories; selecting the best sets of documents to be judged; calculating evaluation measures, both, robust to incomplete judgments; statistical inference of evaluation metrics; inference of judgments on relevance, query selection; techniques to test the reliability of the evaluation and reusability of the constructed collections; and other alternative methods to pooling. This paper is intended to link the reader to the corpus of ‘must read’ papers in the area of low-cost evaluation of IR systems. 相似文献
4.
On information retrieval metrics designed for evaluation with incomplete relevance assessments 总被引:1,自引:0,他引:1
Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments
has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, where many documents exist that were never examined by the relevance assessors, is receiving a lot of attention.
This article compares the robustness of IR metrics to incomplete relevance assessments, using four different sets of graded-relevance
test collections with submitted runs—the TREC 2003 and 2004 robust track data and the NTCIR-6 Japanese and Chinese IR data
from the crosslingual task. Following previous work, we artificially reduce the original relevance data to simulate IR evaluation
environments with extremely incomplete relevance data. We then investigate the effect of this reduction on discriminative power, which we define as the proportion of system pairs with a statistically significant difference for a given probability of
Type I Error, and on Kendall’s rank correlation, which reflects the overall resemblance of two system rankings according to two different metrics or two different relevance
data sets. According to these experiments, Q′, nDCG′ and AP′ proposed by Sakai are superior to bpref proposed by Buckley and
Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also point out some weaknesses of bpref and Rank-Biased
Precision by examining their formal definitions.
相似文献
Noriko KandoEmail: |
5.
Direct optimization of evaluation measures has become an important branch of learning to rank for information retrieval (IR).
Since IR evaluation measures are difficult to optimize due to their non-continuity and non-differentiability, most direct
optimization methods optimize some surrogate functions instead, which we call surrogate measures. A critical issue regarding
these methods is whether the optimization of the surrogate measures can really lead to the optimization of the original IR
evaluation measures. In this work, we perform formal analysis on this issue. We propose a concept named “tendency correlation”
to describe the relationship between a surrogate measure and its corresponding IR evaluation measure. We show that when a
surrogate measure has arbitrarily strong tendency correlation with an IR evaluation measure, the optimization of it will lead
to the effective optimization of the original IR evaluation measure. Then, we analyze the tendency correlations of the surrogate
measures optimized in a number of direct optimization methods. We prove that the surrogate measures in SoftRank and ApproxRank
can have arbitrarily strong tendency correlation with the original IR evaluation measures, regardless of the data distribution,
when some parameters are appropriately set. However, the surrogate measures in SVM
MAP
, DORM
NDCG
, PermuRank
MAP
, and SVM
NDCG
cannot have arbitrarily strong tendency correlation with the original IR evaluation measures on certain distributions of
data. Therefore SoftRank and ApproxRank are theoretically sounder than SVM
MAP
, DORM
NDCG
, PermuRank
MAP
, and SVM
NDCG
, and are expected to result in better ranking performances. Our theoretical findings can explain the experimental results
observed on public benchmark datasets. 相似文献
6.
Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems 总被引:1,自引:0,他引:1
Guido Zuccon Teerapong Leelanupab Stewart Whiting Emine Yilmaz Joemon M. Jose Leif Azzopardi 《Information Retrieval》2013,16(2):267-305
In the field of information retrieval (IR), researchers and practitioners are often faced with a demand for valid approaches to evaluate the performance of retrieval systems. The Cranfield experiment paradigm has been dominant for the in-vitro evaluation of IR systems. Alternative to this paradigm, laboratory-based user studies have been widely used to evaluate interactive information retrieval (IIR) systems, and at the same time investigate users’ information searching behaviours. Major drawbacks of laboratory-based user studies for evaluating IIR systems include the high monetary and temporal costs involved in setting up and running those experiments, the lack of heterogeneity amongst the user population and the limited scale of the experiments, which usually involve a relatively restricted set of users. In this paper, we propose an alternative experimental methodology to laboratory-based user studies. Our novel experimental methodology uses a crowdsourcing platform as a means of engaging study participants. Through crowdsourcing, our experimental methodology can capture user interactions and searching behaviours at a lower cost, with more data, and within a shorter period than traditional laboratory-based user studies, and therefore can be used to assess the performances of IIR systems. In this article, we show the characteristic differences of our approach with respect to traditional IIR experimental and evaluation procedures. We also perform a use case study comparing crowdsourcing-based evaluation with laboratory-based evaluation of IIR systems, which can serve as a tutorial for setting up crowdsourcing-based IIR evaluations. 相似文献
7.
信息检索课在高等院校信息素质教育中发挥着重要的作用,提高信息素质教育水平也是该课程的教学目标。从构建新的教学目标、合理调整教学内容、整合多种现代教学方法、加强与学科专业课程的结合、建立有效的评价体系、提高教师队伍的综合素质6个方面对面向信息素质教育的信息检索课教学改革进行了探讨。 相似文献
8.
9.
网络环境下的信息检索教学设计 总被引:4,自引:0,他引:4
本文在分析网络信息检索教学现状的基础上,探讨了如何在新形势下进一步进行网络信息检索教学改革。主要在教学内容、教学方法及师资建设上进行新的尝试,来提高信息检索教学质量,以适应信息时代发展的需要。 相似文献
10.
介绍了首都医科大学的在线考试系统,比较分析了学生的考试成绩,指出了在线考试系统的优点及需改进的问题。 相似文献
11.
Hersh WR Crabtree MK Hickam DH Sacherek L Rose L Friedman CP 《Bulletin of the Medical Library Association》2000,88(4):323-331
OBJECTIVES: Despite the growing use of online databases by clinicians, there has been very little research documenting how effectively they are used. This study assessed the ability of medical and nurse-practitioner students to answer clinical questions using an information retrieval system. It also attempted to identify the demographic, experience, cognitive, personality, search mechanics, and user-satisfaction factors associated with successful use of a retrieval system. METHODS: Twenty-nine students completed questionnaires of clinical and computer experience as well as tests of cognitive abilities and personality type. They were then administered three clinical questions to answer in a medical library setting using the MEDLINE database and electronic and print full-text resources. RESULTS: Medical students were able to answer more questions correctly than nurse-practitioner students before and after searching, but both had comparable improvements in the number of correct questions before and after searching. Successful ability to answer questions was also associated with having experience in literature searching and higher standardized test-score percentiles. CONCLUSIONS: Medical and nurse-practitioner students obtained comparable benefits in the ability to answer clinical questions from use of the information retrieval system. Future research must examine strategies that improve successful search and retrieval of clinical questions posed by clinicians in practice. 相似文献
12.
总结了目前医学信息检索双语教材建设的现状,分析了医学信息检索双语教材建设的必要性和可行性,具体阐述了华北煤炭医学院、首都医科大学等国内多所高校联合编写医学信息检索双语教材的实践。 相似文献
13.
信息检索课的个性化教学探索 总被引:5,自引:1,他引:5
本文提出在信息检索课中倡导个性化教学,促进学生个性的和谐发展,达成培养学生自学能力和创新能力的目标,并提出在课堂教学、检索实习和课外实践应用各个教学环节实施个性化教学的具体策略。 相似文献
14.
Lorraine Goeuriot Gareth J. F. Jones Liadh Kelly Johannes Leveling Mihai Lupu Joao Palotti Guido Zuccon 《Information Retrieval》2018,21(6):507-540
Since its inception in 2013, one of the key contributions of the CLEF eHealth evaluation campaign has been the organization of an ad-hoc information retrieval (IR) benchmarking task. This IR task evaluates systems intended to support laypeople searching for and understanding health information. Each year the task provides registered participants with standard IR test collections consisting of a document collection and topic set. Participants then return retrieval results obtained by their IR systems for each query, which are assessed using a pooling procedure. In this article we focus on CLEF eHealth 2013 and 2014s retrieval task, which saw topics created based on patients’ information needs associated with their medical discharge summaries. We overview the task and datasets created, and the results obtained by participating teams over these two years. We then provide a detailed comparative analysis of the results, and conduct an evaluation of the datasets in the light of these results. This twofold study of the evaluation campaign teaches us about technical aspects of medical IR, such as the effectiveness of query expansion; the quality and characteristics of CLEF eHealth IR datasets, such as their reliability; and how to run an IR evaluation campaign in the medical domain. 相似文献
15.
遵照军队任职教育的要求,武警医学院图书馆开展医学信息检索教学时依据武警系统卫生干部的特点确定教学目标,依据卫生干部的需求选定教学内容,将集中培训与个别辅导相结合,在提高干部信息素质的基础上,提高科研论文的写作能力。教学实践证明,这样既增强了卫生干部的发展后劲,又有利于在职卫生干部成为胜任“全岗位”、“全职能”的复合型军事医学人才。 相似文献
16.
17.
Most ranking algorithms are based on the optimization of some loss functions, such as the pairwise loss. However, these loss
functions are often different from the criteria that are adopted to measure the quality of the web page ranking results. To
overcome this problem, we propose an algorithm which aims at directly optimizing popular measures such as the Normalized Discounted
Cumulative Gain and the Average Precision. The basic idea is to minimize a smooth approximation of these measures with gradient
descent. Crucial to this kind of approach is the choice of the smoothing factor. We provide various theoretical analysis on
that choice and propose an annealing algorithm to iteratively minimize a less and less smoothed approximation of the measure of interest. Results on the Letor
benchmark datasets show that the proposed algorithm achieves state-of-the-art performances. 相似文献
18.
王红霞 《中华医学图书馆杂志》2011,(9):56-57
对比分析了PubMed,BIOSISPreviews,EMBASE.corn3个数据库的收录情况、检索结果、关注度,为医学科研定题或立项检索时合理选择英文医学检索工具提供依据,提高外文文献的查全率。 相似文献
19.
网络环境下信息检索与利用课程教学模式探讨与实践 总被引:3,自引:0,他引:3
文章探讨了网络环境下信息检索与利用课程的教学模式.这种模式以建构主义学习理论与教学理论为指导,以培养学生学习兴趣,加强学生的实践操作为核心内容.主要由兴趣教学与实践教学两部分组成,贯穿于课前准备、课堂教学、实践环节、课程考核等教学过程. 相似文献
20.
This research investigated self-efficacy perceptions of Israeli library and information science (LIS) professionals regarding their information retrieval skills, examining the judgments that participants make about their own searching abilities. The study was based on Bandura's four sources of self-efficacy information: (a) past performance or mastery experiences; (b) vicarious observation of others' experiences; (c) verbal or social feedback; and, (d) affective states. An online survey presenting the Information Retrieval Self-Efficacy Scale was distributed among three existing Israeli LIS discussion groups. The questionnaire was completed by 201 LIS professionals. Findings show that participants reported a high level of self-efficacy regarding information retrieval and all four sources of self-efficacy information influenced the construction of self-efficacy beliefs. Correlations between self-efficacy perceptions and several socio-demographic variables were investigated. The data analysis revealed that men and women are impacted differently by self-efficacy information; women reported a higher score for affective states and men are more prone to frustration. Also, a significant relation was found between age and years of experience, as well as the sources that exerted more influence on participants. Older and more experienced participants reported being more impacted by their mastery experiences and their affective states. Participants in the middle of their careers reported a greater influence of social feedback on their self-perception of self-efficacy. 相似文献