首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   19篇
  免费   0篇
科学研究   8篇
信息传播   11篇
  2022年   1篇
  2019年   1篇
  2018年   1篇
  2012年   2篇
  2010年   1篇
  2009年   3篇
  2008年   2篇
  2007年   1篇
  2006年   2篇
  2005年   1篇
  2004年   1篇
  2002年   2篇
  2000年   1篇
排序方式: 共有19条查询结果,搜索用时 15 毫秒
11.
基于HTML文档结构的向量空间模型的改进   总被引:8,自引:1,他引:8  
胡健  陆一鸣  马范援 《情报学报》2005,24(4):433-437
根据HTML文档不同标签域的分布特征和对文档内容的代表能力不同,我们提出了一种改进的向量模型(PFTF),并通过trec12的查询实验,比较了传统向量模型与PFTF模型对单个标签域以及多个文档表示结果的结合的检索性能。实验结果表明,PFTF模型对于这两个方面都有提高。  相似文献   
12.
Timeline generation systems are a class of algorithms that produce a sequence of time-ordered sentences or text snippets extracted in real-time from high-volume streams of digital documents (e.g. news articles), focusing on retaining relevant and informative content for a particular information need (e.g. topic or event). These systems have a range of uses, such as producing concise overviews of events for end-users (human or artificial agents). To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. sentences) to an explicit representation of what information a ‘good’ summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such evaluations fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which the timeline summarization test collections fail to generalize to new summarization systems, then we propose, evaluate and analyze new automatic solutions to this issue. In particular, using a depooling methodology over 19 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being mis-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of mis-ranking systems, we also propose a range of different automatic ground truth label expansion techniques. Our results show that the proposed expansion techniques can be effective at increasing the robustness of the TREC-TS test collections, as they are able to generate large numbers missing matches with high accuracy, markedly reducing the number of mis-rankings by up to 50%.  相似文献   
13.
Information Filtering in TREC-9 and TDT-3: A Comparative Analysis   总被引:2,自引:0,他引:2  
Much work on automated information filtering has been done in the TREC and TDT domains, but differences in corpora, the nature of TREC topics vs. TDT events, the constraints imposed on training and testing, and the choices of performance measures confound any meaningful comparison between these domains. We attempt to bridge the gap between them by evaluating the performance of the k-nearest-neighbor (kNN) classification system on the corpus and categories from one domain using the constraints of the other. To maximize comparability and understand the effect of the evaluation metrics specific to each domain, we optimize the performance of kNN separately for the F 1, T9P (preferred metric for TREC-9) and C trk (official metric for TDT-3) metrics. Through a thorough comparison of our within-domain and cross-domain results, our results demonstrate that the corpus used for TREC-9 is more challenging for an information filtering system than the TDT-3 corpus and strongly suggest that the TDT-3 event tracking task itself is more difficult than the TREC batch filtering task. We also show that optimizing performance in TREC-9 and TDT-3 tends to result in systems with different performance characteristics, confounding any meaningful comparison between the two domains, and that T9P and C trk both have properties that make them undesirable as general information filtering metrics.  相似文献   
14.
TREC概况及其最新发展研究   总被引:4,自引:0,他引:4  
介绍了文本检索会议的基本情况,包括其研究目标、主要项目和研究内容,然后介绍了TREC-2003新增加的基因学项目的主要内容,最后介绍了TREC主要项目的最新变动情况以及它对国内信息检索评价的启示。  相似文献   
15.
TREC人机交互检索评价项目研究   总被引:1,自引:0,他引:1  
介绍TREC交互项目的研究目标、试验设计、评价结果及其发展归宿。将TREC交互项目的发展划分为4个阶段,介绍各阶段在评价指标、试验课题等方面的变化,其中评价指标包括方面查全率、方面查准率、检索耗费时间及用户满意度等。从中可以发现,信息检索评价领域越来越注重“面向用户”的特征。  相似文献   
16.
Learning low dimensional dense representations of the vocabularies of a corpus, known as neural embeddings, has gained much attention in the information retrieval community. While there have been several successful attempts at integrating embeddings within the ad hoc document retrieval task, yet, no systematic study has been reported that explores the various aspects of neural embeddings and how they impact retrieval performance. In this paper, we perform a methodical study on how neural embeddings influence the ad hoc document retrieval task. More specifically, we systematically explore the following research questions: (i) do methods solely based on neural embeddings perform competitively with state of the art retrieval methods with and without interpolation? (ii) are there any statistically significant difference between the performance of retrieval models when based on word embeddings compared to when knowledge graph entity embeddings are used? and (iii) is there significant difference between using locally trained neural embeddings compared to when globally trained neural embeddings are used? We examine these three research questions across both hard and all queries. Our study finds that word embeddings do not show competitive performance to any of the baselines. In contrast, entity embeddings show competitive performance to the baselines and when interpolated, outperform the best baselines for both hard and soft queries.  相似文献   
17.
This paper introduces the special issue, and reviews the routing and filtering tasks as defined and evaluated at TREC. The tasks attempt to simulate a specific service situation: the system is assumed to process an incoming stream of documents against profiles of user interest, strictly in the time order in which they arrive, and immediately refer any matching document to the user. In the adaptive filtering version of the task, the user is assumed to provide a relevance judgement instantly. The rationale for the task definitions and the evaluation measures used is discussed.  相似文献   
18.
Crowdsourcing has recently gained a lot of attention as a tool for conducting different kinds of relevance evaluations. At a very high level, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an in-house employee. This crowdsourcing approach makes possible to conduct information retrieval experiments extremely fast, with good results at a low cost.  相似文献   
19.
The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text   总被引:1,自引:1,他引:0  
A known-item search is a particular information retrieval task in which the system is asked to find a single target document in a large document set. The TREC-5 confusion track used a set of 49 known-item tasks to study the impact of data corruption on retrieval system performance. Two corrupted versions of a 55,600 document corpus whose true content was known were created by applying OCR techniques to page images. The first version of the corpus used the page images as scanned, resulting in an estimated character error rate of approximately 5%. The second version used page images that had been down-sampled, resulting in an estimated character error rate of approximately 20%. The true text and each of the corrupted versions were then searched using the same set of 49 questions. In general, retrieval methods that attempted a probabilistic reconstruction of the original clean text fared better than methods that simply accepted corrupted versions of the query text.  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号