首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
For a system-based information retrieval evaluation, test collection model still remains as a costly task. Producing relevance judgments is an expensive, time consuming task which has to be performed by human assessors. It is not viable to assess the relevancy of every single document in a corpus against each topic for a large collection. In an experimental-based environment, partial judgment on the basis of a pooling method is created to substitute a complete assessment of documents for relevancy. Due to the increasing number of documents, topics, and retrieval systems, the need to perform low-cost evaluations while obtaining reliable results is essential. Researchers are seeking techniques to reduce the costs of experimental IR evaluation process by the means of reducing the number of relevance judgments to be performed or even eliminating them while still obtaining reliable results. In this paper, various state-of-the-art approaches in performing low-cost retrieval evaluation are discussed under each of the following categories; selecting the best sets of documents to be judged; calculating evaluation measures, both, robust to incomplete judgments; statistical inference of evaluation metrics; inference of judgments on relevance, query selection; techniques to test the reliability of the evaluation and reusability of the constructed collections; and other alternative methods to pooling. This paper is intended to link the reader to the corpus of ‘must read’ papers in the area of low-cost evaluation of IR systems.  相似文献   

Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, where many documents exist that were never examined by the relevance assessors, is receiving a lot of attention. This article compares the robustness of IR metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs—the TREC 2003 and 2004 robust track data and the NTCIR-6 Japanese and Chinese IR data from the crosslingual task. Following previous work, we artificially reduce the original relevance data to simulate IR evaluation environments with extremely incomplete relevance data. We then investigate the effect of this reduction on discriminative power, which we define as the proportion of system pairs with a statistically significant difference for a given probability of Type I Error, and on Kendall’s rank correlation, which reflects the overall resemblance of two system rankings according to two different metrics or two different relevance data sets. According to these experiments, Q′, nDCG′ and AP′ proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also point out some weaknesses of bpref and Rank-Biased Precision by examining their formal definitions.
Noriko KandoEmail:

Direct optimization of evaluation measures has become an important branch of learning to rank for information retrieval (IR). Since IR evaluation measures are difficult to optimize due to their non-continuity and non-differentiability, most direct optimization methods optimize some surrogate functions instead, which we call surrogate measures. A critical issue regarding these methods is whether the optimization of the surrogate measures can really lead to the optimization of the original IR evaluation measures. In this work, we perform formal analysis on this issue. We propose a concept named “tendency correlation” to describe the relationship between a surrogate measure and its corresponding IR evaluation measure. We show that when a surrogate measure has arbitrarily strong tendency correlation with an IR evaluation measure, the optimization of it will lead to the effective optimization of the original IR evaluation measure. Then, we analyze the tendency correlations of the surrogate measures optimized in a number of direct optimization methods. We prove that the surrogate measures in SoftRank and ApproxRank can have arbitrarily strong tendency correlation with the original IR evaluation measures, regardless of the data distribution, when some parameters are appropriately set. However, the surrogate measures in SVM MAP , DORM NDCG , PermuRank MAP , and SVM NDCG cannot have arbitrarily strong tendency correlation with the original IR evaluation measures on certain distributions of data. Therefore SoftRank and ApproxRank are theoretically sounder than SVM MAP , DORM NDCG , PermuRank MAP , and SVM NDCG , and are expected to result in better ranking performances. Our theoretical findings can explain the experimental results observed on public benchmark datasets.  相似文献   

In the field of information retrieval (IR), researchers and practitioners are often faced with a demand for valid approaches to evaluate the performance of retrieval systems. The Cranfield experiment paradigm has been dominant for the in-vitro evaluation of IR systems. Alternative to this paradigm, laboratory-based user studies have been widely used to evaluate interactive information retrieval (IIR) systems, and at the same time investigate users’ information searching behaviours. Major drawbacks of laboratory-based user studies for evaluating IIR systems include the high monetary and temporal costs involved in setting up and running those experiments, the lack of heterogeneity amongst the user population and the limited scale of the experiments, which usually involve a relatively restricted set of users. In this paper, we propose an alternative experimental methodology to laboratory-based user studies. Our novel experimental methodology uses a crowdsourcing platform as a means of engaging study participants. Through crowdsourcing, our experimental methodology can capture user interactions and searching behaviours at a lower cost, with more data, and within a shorter period than traditional laboratory-based user studies, and therefore can be used to assess the performances of IIR systems. In this article, we show the characteristic differences of our approach with respect to traditional IIR experimental and evaluation procedures. We also perform a use case study comparing crowdsourcing-based evaluation with laboratory-based evaluation of IIR systems, which can serve as a tutorial for setting up crowdsourcing-based IIR evaluations.  相似文献   

信息检索课在高等院校信息素质教育中发挥着重要的作用,提高信息素质教育水平也是该课程的教学目标。从构建新的教学目标、合理调整教学内容、整合多种现代教学方法、加强与学科专业课程的结合、建立有效的评价体系、提高教师队伍的综合素质6个方面对面向信息素质教育的信息检索课教学改革进行了探讨。  相似文献   

网络信息检索的未来   总被引:8,自引:0,他引:8  
网络信息检索在未来的发展表现在以下几个方面:网络检索工具的综合化与专业化;网络检索工具的智能化;检索语言的两极化;对非文本信息检索能力的提高;人工参与检索工具的信息组织;收费网络信息检索工具的兴起.  相似文献   

网络环境下的信息检索教学设计   总被引:4,自引:0,他引:4  
本文在分析网络信息检索教学现状的基础上,探讨了如何在新形势下进一步进行网络信息检索教学改革。主要在教学内容、教学方法及师资建设上进行新的尝试,来提高信息检索教学质量,以适应信息时代发展的需要。  相似文献   

介绍了首都医科大学的在线考试系统,比较分析了学生的考试成绩,指出了在线考试系统的优点及需改进的问题。  相似文献   

OBJECTIVES: Despite the growing use of online databases by clinicians, there has been very little research documenting how effectively they are used. This study assessed the ability of medical and nurse-practitioner students to answer clinical questions using an information retrieval system. It also attempted to identify the demographic, experience, cognitive, personality, search mechanics, and user-satisfaction factors associated with successful use of a retrieval system. METHODS: Twenty-nine students completed questionnaires of clinical and computer experience as well as tests of cognitive abilities and personality type. They were then administered three clinical questions to answer in a medical library setting using the MEDLINE database and electronic and print full-text resources. RESULTS: Medical students were able to answer more questions correctly than nurse-practitioner students before and after searching, but both had comparable improvements in the number of correct questions before and after searching. Successful ability to answer questions was also associated with having experience in literature searching and higher standardized test-score percentiles. CONCLUSIONS: Medical and nurse-practitioner students obtained comparable benefits in the ability to answer clinical questions from use of the information retrieval system. Future research must examine strategies that improve successful search and retrieval of clinical questions posed by clinicians in practice.  相似文献   

总结了目前医学信息检索双语教材建设的现状,分析了医学信息检索双语教材建设的必要性和可行性,具体阐述了华北煤炭医学院、首都医科大学等国内多所高校联合编写医学信息检索双语教材的实践。  相似文献   

信息检索课的个性化教学探索   总被引:5,自引:1,他引:5  
本文提出在信息检索课中倡导个性化教学,促进学生个性的和谐发展,达成培养学生自学能力和创新能力的目标,并提出在课堂教学、检索实习和课外实践应用各个教学环节实施个性化教学的具体策略。  相似文献   

Since its inception in 2013, one of the key contributions of the CLEF eHealth evaluation campaign has been the organization of an ad-hoc information retrieval (IR) benchmarking task. This IR task evaluates systems intended to support laypeople searching for and understanding health information. Each year the task provides registered participants with standard IR test collections consisting of a document collection and topic set. Participants then return retrieval results obtained by their IR systems for each query, which are assessed using a pooling procedure. In this article we focus on CLEF eHealth 2013 and 2014s retrieval task, which saw topics created based on patients’ information needs associated with their medical discharge summaries. We overview the task and datasets created, and the results obtained by participating teams over these two years. We then provide a detailed comparative analysis of the results, and conduct an evaluation of the datasets in the light of these results. This twofold study of the evaluation campaign teaches us about technical aspects of medical IR, such as the effectiveness of query expansion; the quality and characteristics of CLEF eHealth IR datasets, such as their reliability; and how to run an IR evaluation campaign in the medical domain.  相似文献   

遵照军队任职教育的要求,武警医学院图书馆开展医学信息检索教学时依据武警系统卫生干部的特点确定教学目标,依据卫生干部的需求选定教学内容,将集中培训与个别辅导相结合,在提高干部信息素质的基础上,提高科研论文的写作能力。教学实践证明,这样既增强了卫生干部的发展后劲,又有利于在职卫生干部成为胜任“全岗位”、“全职能”的复合型军事医学人才。  相似文献   

信息检索课开展双语教学的探讨   总被引:1,自引:0,他引:1  
文章在分析了信息检索课开展双语教学的可行性及存在的问题基础上,提出了在教学的初级阶段应采取的一些措施.  相似文献   

Most ranking algorithms are based on the optimization of some loss functions, such as the pairwise loss. However, these loss functions are often different from the criteria that are adopted to measure the quality of the web page ranking results. To overcome this problem, we propose an algorithm which aims at directly optimizing popular measures such as the Normalized Discounted Cumulative Gain and the Average Precision. The basic idea is to minimize a smooth approximation of these measures with gradient descent. Crucial to this kind of approach is the choice of the smoothing factor. We provide various theoretical analysis on that choice and propose an annealing algorithm to iteratively minimize a less and less smoothed approximation of the measure of interest. Results on the Letor benchmark datasets show that the proposed algorithm achieves state-of-the-art performances.  相似文献   

对比分析了PubMed,BIOSISPreviews,EMBASE.corn3个数据库的收录情况、检索结果、关注度,为医学科研定题或立项检索时合理选择英文医学检索工具提供依据,提高外文文献的查全率。  相似文献   

网络环境下信息检索与利用课程教学模式探讨与实践   总被引:3,自引:0,他引:3  
文章探讨了网络环境下信息检索与利用课程的教学模式.这种模式以建构主义学习理论与教学理论为指导,以培养学生学习兴趣,加强学生的实践操作为核心内容.主要由兴趣教学与实践教学两部分组成,贯穿于课前准备、课堂教学、实践环节、课程考核等教学过程.  相似文献   

This research investigated self-efficacy perceptions of Israeli library and information science (LIS) professionals regarding their information retrieval skills, examining the judgments that participants make about their own searching abilities. The study was based on Bandura's four sources of self-efficacy information: (a) past performance or mastery experiences; (b) vicarious observation of others' experiences; (c) verbal or social feedback; and, (d) affective states. An online survey presenting the Information Retrieval Self-Efficacy Scale was distributed among three existing Israeli LIS discussion groups. The questionnaire was completed by 201 LIS professionals. Findings show that participants reported a high level of self-efficacy regarding information retrieval and all four sources of self-efficacy information influenced the construction of self-efficacy beliefs. Correlations between self-efficacy perceptions and several socio-demographic variables were investigated. The data analysis revealed that men and women are impacted differently by self-efficacy information; women reported a higher score for affective states and men are more prone to frustration. Also, a significant relation was found between age and years of experience, as well as the sources that exerted more influence on participants. Older and more experienced participants reported being more impacted by their mastery experiences and their affective states. Participants in the middle of their careers reported a greater influence of social feedback on their self-perception of self-efficacy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号