期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

11.

基于HTML文档结构的向量空间模型的改进 总被引：8，自引：1，他引：8

胡健陆一鸣马范援《情报学报》2005,24(4):433-437

根据HTML文档不同标签域的分布特征和对文档内容的代表能力不同,我们提出了一种改进的向量模型(PFTF),并通过trec12的查询实验,比较了传统向量模型与PFTF模型对单个标签域以及多个文档表示结果的结合的检索性能。实验结果表明,PFTF模型对于这两个方面都有提高。相似文献

12.

On enhancing the robustness of timeline summarization test collections

Richard McCreadie Shahzad Rajput Ian Soboroff Craig Macdonald Iadh Ounis 《Information processing & management》2019,56(5):1815-1836

Timeline generation systems are a class of algorithms that produce a sequence of time-ordered sentences or text snippets extracted in real-time from high-volume streams of digital documents (e.g. news articles), focusing on retaining relevant and informative content for a particular information need (e.g. topic or event). These systems have a range of uses, such as producing concise overviews of events for end-users (human or artificial agents). To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. sentences) to an explicit representation of what information a ‘good’ summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such evaluations fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which the timeline summarization test collections fail to generalize to new summarization systems, then we propose, evaluate and analyze new automatic solutions to this issue. In particular, using a depooling methodology over 19 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being mis-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of mis-ranking systems, we also propose a range of different automatic ground truth label expansion techniques. Our results show that the proposed expansion techniques can be effective at increasing the robustness of the TREC-TS test collections, as they are able to generate large numbers missing matches with high accuracy, markedly reducing the number of mis-rankings by up to 50%. 相似文献

13.

Information Filtering in TREC-9 and TDT-3: A Comparative Analysis 总被引：2，自引：0，他引：2

Thomas Galen Ault Yiming Yang 《Information Retrieval》2002,5(2-3):159-187

Much work on automated information filtering has been done in the TREC and TDT domains, but differences in corpora, the nature of TREC topics vs. TDT events, the constraints imposed on training and testing, and the choices of performance measures confound any meaningful comparison between these domains. We attempt to bridge the gap between them by evaluating the performance of the k-nearest-neighbor (kNN) classification system on the corpus and categories from one domain using the constraints of the other. To maximize comparability and understand the effect of the evaluation metrics specific to each domain, we optimize the performance of kNN separately for the F ₁, T9P (preferred metric for TREC-9) and C _trk (official metric for TDT-3) metrics. Through a thorough comparison of our within-domain and cross-domain results, our results demonstrate that the corpus used for TREC-9 is more challenging for an information filtering system than the TDT-3 corpus and strongly suggest that the TDT-3 event tracking task itself is more difficult than the TREC batch filtering task. We also show that optimizing performance in TREC-9 and TDT-3 tends to result in systems with different performance characteristics, confounding any meaningful comparison between the two domains, and that T9P and C _trk both have properties that make them undesirable as general information filtering metrics. 相似文献

14.

TREC概况及其最新发展研究 总被引：4，自引：0，他引：4

张秀坤赵丹群《情报理论与实践》2004,27(5):537-540

介绍了文本检索会议的基本情况，包括其研究目标、主要项目和研究内容，然后介绍了TREC-2003新增加的基因学项目的主要内容，最后介绍了TREC主要项目的最新变动情况以及它对国内信息检索评价的启示。相似文献

15.

TREC人机交互检索评价项目研究 总被引：1，自引：0，他引：1

张秀坤《图书情报工作》2006,50(1):72-75

介绍TREC交互项目的研究目标、试验设计、评价结果及其发展归宿。将TREC交互项目的发展划分为4个阶段,介绍各阶段在评价指标、试验课题等方面的变化,其中评价指标包括方面查全率、方面查准率、检索耗费时间及用户满意度等。从中可以发现,信息检索评价领域越来越注重“面向用户”的特征。相似文献

16.

Neural word and entity embeddings for ad hoc retrieval

Ebrahim Bagheri Faezeh Ensan Feras Al-Obeidat 《Information processing & management》2018,54(4):657-673

Learning low dimensional dense representations of the vocabularies of a corpus, known as neural embeddings, has gained much attention in the information retrieval community. While there have been several successful attempts at integrating embeddings within the ad hoc document retrieval task, yet, no systematic study has been reported that explores the various aspects of neural embeddings and how they impact retrieval performance. In this paper, we perform a methodical study on how neural embeddings influence the ad hoc document retrieval task. More specifically, we systematically explore the following research questions: (i) do methods solely based on neural embeddings perform competitively with state of the art retrieval methods with and without interpolation? (ii) are there any statistically significant difference between the performance of retrieval models when based on word embeddings compared to when knowledge graph entity embeddings are used? and (iii) is there significant difference between using locally trained neural embeddings compared to when globally trained neural embeddings are used? We examine these three research questions across both hard and all queries. Our study finds that word embeddings do not show competitive performance to any of the baselines. In contrast, entity embeddings show competitive performance to the baselines and when interpolated, outperform the best baselines for both hard and soft queries. 相似文献

17.

Introduction to the Special Issue: Overview of the TREC Routing and Filtering Tasks

Stephen Robertson 《Information Retrieval》2002,5(2-3):127-137

This paper introduces the special issue, and reviews the routing and filtering tasks as defined and evaluated at TREC. The tasks attempt to simulate a specific service situation: the system is assumed to process an incoming stream of documents against profiles of user interest, strictly in the time order in which they arrive, and immediately refer any matching document to the user. In the adaptive filtering version of the task, the user is assumed to provide a relevance judgement instantly. The rationale for the task definitions and the evaluation measures used is discussed. 相似文献

18.

Using crowdsourcing for TREC relevance assessment

Omar Alonso Stefano Mizzaro 《Information processing & management》2012

Crowdsourcing has recently gained a lot of attention as a tool for conducting different kinds of relevance evaluations. At a very high level, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an in-house employee. This crowdsourcing approach makes possible to conduct information retrieval experiments extremely fast, with good results at a low cost. 相似文献

19.

The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text 总被引：1，自引：1，他引：0

Paul B. Kantor Ellen M. Voorhees 《Information Retrieval》2000,2(2-3):165-176

A known-item search is a particular information retrieval task in which the system is asked to find a single target document in a large document set. The TREC-5 confusion track used a set of 49 known-item tasks to study the impact of data corruption on retrieval system performance. Two corrupted versions of a 55,600 document corpus whose true content was known were created by applying OCR techniques to page images. The first version of the corpus used the page images as scanned, resulting in an estimated character error rate of approximately 5%. The second version used page images that had been down-sampled, resulting in an estimated character error rate of approximately 20%. The true text and each of the corrupted versions were then searched using the same set of 49 questions. In general, retrieval methods that attempted a probabilistic reconstruction of the original clean text fared better than methods that simply accepted corrupted versions of the query text. 相似文献