首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The influential Text REtrieval Conference (TREC) retrieval conference has always relied upon specialist assessors or occasionally participating groups to create relevance judgements for the tracks that it runs. Recently however, crowdsourcing has been championed as a cheap, fast and effective alternative to traditional TREC-like assessments. In 2010, TREC tracks experimented with crowdsourcing for the very first time. In this paper, we report our successful experience in creating relevance assessments for the TREC Blog track 2010 top news stories task using crowdsourcing. In particular, we crowdsourced both real-time newsworthiness assessments for news stories as well as traditional relevance assessments for blog posts. We conclude that crowdsourcing not only appears to be a feasible, but also cheap and fast means to generate relevance assessments. Furthermore, we detail our experiences running the crowdsourced evaluation of the TREC Blog track, discuss the lessons learned, and provide best practices.  相似文献   

A huge volume of news stories are reported by various news channels, on a daily basis. Subscribing to all the stories and keeping track of the important ones day after day is very time-consuming. This paper proposes several approaches to identify important news stories. To this end, we take advantage of the blogosphere as an information source to evaluate the importance of news stories. Blogs reflect the diverse opinions of bloggers about news stories, and the attention that these stories receive can help estimate the importance of the stories. In this paper, we define the popularity of a news story in the blogosphere as the attention it attracts from users. We measure popularity of the stories in the blogosphere from two viewpoints: content and a timeline. In terms of content, we suggest several approaches to estimate language models for a news story and blog posts, and we evaluate the importance of the story using these language models. Furthermore, we generate a temporal profile of a news story by exploring the timeline of blog posts related to the story, and evaluate its importance based on the temporal profile. We experimentally verify the effectiveness of the proposed approaches for identifying top news stories.  相似文献   

This study uses bibliographic coupling to identify missing relevant patent links, in order to construct a comprehensive citation network. Missing citation links can be added by taking the missing relevant patent links into account. The Pareto principle is used to determine the threshold of bibliographic coupling strength, in order to identify the missing relevant patent links. Comparisons between the original patent citation network and the comprehensive patent citation network with the missing relevant patent links are illustrated at both the patent and assignee levels. Light emitting diode (LED) illuminating technology is chosen as the case study. The relationships between the patents and the assignees are obviously enhanced after adding the missing relevant patent links. The results show that the growth rates on both the total number and the average number of links have apparently improved at the patent level. At the assignee level, the number of linked assignees and the average number of links between two assignees are increased. The differences between the two citation networks are further examined by means of the Freeman vertex betweenness centrality and Johnson's hierarchical clustering. The patents with more new links to other patents have distinct results in terms of the Freeman vertex betweenness centrality. The enhancement of links among patents also results in different clustering.  相似文献   

Distributed top-k query processing is increasingly becoming an essential functionality in a large number of emerging application classes. This paper addresses the efficient algebraic optimization of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We use a dynamic programming approach to find the optimal execution plan using compact data synopses for selectivity estimation that is the basis for our cost model. The optimized query is executed in a hierarchical way involving a small and fixed number of communication phases. We have performed experiments on real web data that show the benefits of distributed top-k query optimization both in network resource consumption and query response time.  相似文献   

A new link-based document ranking framework is devised with at its heart, a contents and time sensitive random literature explorer designed to more accurately model the behaviour of readers of scientific documents. In particular, our ranking framework dynamically adjusts its random walk parameters according to both contents and age of encountered documents, thus incorporating the diversity of topics and how they evolve over time into the score of a scientific publication. Our random walk framework results in a ranking of scientific documents which is shown to be more effective in facilitating literature exploration than PageRank measured against a proxy gold standard based on papers’ potential usefulness in facilitating later research. One of its many strengths lies in its practical value in reliably retrieving and placing promisingly useful papers at the top of its ranking.  相似文献   

Objective:The National Library of Medicine (NLM) inaugurated a “publication type” concept to facilitate searches for systematic reviews (SRs). On the other hand, clinical queries (CQs) are validated search strategies designed to retrieve scientifically sound, clinically relevant original and review articles from biomedical literature databases. We compared the retrieval performance of the SR publication type (SR[pt]) against the most sensitive CQ for systematic review articles (CQrs) in PubMed.Methods:We ran date-limited searches of SR[pt] and CQrs to compare the relative yield of articles and SRs, focusing on the differences in retrieval of SRs by SR[pt] but not CQrs (SR[pt] NOT CQrs) and CQrs NOT SR[pt]. Random samples of articles retrieved in each of these comparisons were examined for SRs until a consistent pattern became evident.Results:For SR[pt] NOT CQrs, the yield was relatively low in quantity but rich in quality, with 79% of the articles being SRs. For CQrs NOT SR[pt], the yield was high in quantity but low in quality, with only 8% being SRs. For CQrs AND SR[pt], the quality was highest, with 92% being SRs.Conclusions:We found that SR[pt] had high precision and specificity for SRs but low recall (sensitivity), whereas CQrs had much higher recall. SR[pt] OR CQrs added valid SRs to the CQrs yield at low cost (i.e., added few non-SRs). For searches that are intended to be exhaustive for SRs, SR[pt] can be added to existing sensitive search filters.  相似文献   

The aim of this study was to develop a model to evaluate the retrieval quality of search queries performed by Dutch general practitioners using the printed Index Medicus, MEDLINE on CD-ROM, and MEDLINE through GRATEFUL MED. Four search queries related to general practice were formulated for a continuing medical education course in literature searching. The selected potential relevant citations from the course instructor and the 103 course participants together served as the basic set for the three judges to evaluate for (a) relevance and (b) quality, with the latter based on journal ranking, research design and publication type. Relevant individual citations received a citation quality score from 1 (low) to 4 (high). The overall search quality was expressed in a formula, which included the individual citation quality score of the selected and missed relevant citations, and the number of selected non-relevant citations. The outcome measures were the number and quality of relevant citations and agreement between the judges. Out of 864 citations, 139 were assessed as relevant, of which 44 citations received an individual citation quality score of 1, 76 of 2, 19 of 3 and none of 4. The level of agreement between the judges was 68% for the relevant citations, and 88% for the non-relevant citations. We describe a model for the evaluation of search queries based not only on the relevance, but also on the quality of the citations retrieved. With adaptation, this model could be generalized to other professional users, and to other bibliographic sources.  相似文献   

The retrieval of sentences that are relevant to a given information need is a challenging passage retrieval task. In this context, the well-known vocabulary mismatch problem arises severely because of the fine granularity of the task. Short queries, which are usually the rule rather than the exception, aggravate the problem. Consequently, effective sentence retrieval methods tend to apply some form of query expansion, usually based on pseudo-relevance feedback. Nevertheless, there are no extensive studies comparing different statistical expansion strategies for sentence retrieval. In this work we study thoroughly the effect of distinct statistical expansion methods on sentence retrieval. We start from a set of retrieved documents in which relevant sentences have to be found. In our experiments different term selection strategies are evaluated and we provide empirical evidence to show that expansion before sentence retrieval yields competitive performance. This is particularly novel because expansion for sentence retrieval is often done after sentence retrieval (i.e. expansion terms are mined from a ranked set of sentences) and there are no comparative results available between both types of expansion. Furthermore, this comparison is particularly valuable because there are important implications in time efficiency. We also carefully analyze expansion on weak and strong queries and demonstrate clearly that expanding queries before sentence retrieval is not only more convenient for efficiency purposes, but also more effective when handling poor queries.  相似文献   

This paper suggests a new scientometric index that estimates knowledge diffusion and has two constituents: the first one is equivalent to a usual citation index, i.e., it describes the visible diffusion of scientific knowledge; the second one reflects the implicit diffusion of scientific knowledge and is expressed through the number of implicit citations. The practical value of the suggested index is that it permits implicit initiators of the scientific mainstream to be easily identified. The distinctive feature of such scientists is the large value of the suggested citation index and the low value of the usual citation index.  相似文献   


Fourteen tesserae from the mosaics in the Cathedral of Salerno were examined by X-ray diffraction and by energy-dispersive X-ray analyses. Their constituents were identified and they can be divided into crystalline and amorphous materials. The main crystalline phases are quartz, calcite, magnesium carbonate, magnesium hydroxide and microcline. The amorphous materials are composed of silica containing several elements: Ca, Na, K, A1, Fe, Cr, Mn, Ni, Ti, Cu. For comparison, one tessera from Pompei was characterized. A subdivision of tesserae into three groups, crystalline, amorphous and composite, is suggested.  相似文献   

In the United States we cannot avoid the impact of television on domestic politics. On all sides, every election year, we are buffeted by such developments as the “great debates,” “pre‐election predictions” by computers, arguments over Sec. 315, and various partisan uses of TV. Overseas, many nations have experienced the political effects of television, some similar and some more subtle than our own.  相似文献   

树立精品期刊意识与形成品牌期刊文化   总被引:9,自引:4,他引:9  
我国加入WTO后,面对科技期刊国际化发展趋势和激烈的市场竞争,编辑应将出精品期刊意识贯穿到整个出版过程中,科学定位期刊发展方向,突出学术特色,加强期刊出版服务活动,将期刊培育成品牌期刊,最终上升为品牌期刊文化,达到质的飞跃。  相似文献   

Statistical reference questions are among the most problematic a government information specialist faces. Answering such queries is a subjective process and librarians must be sensitive to the context of both the query and the data. The first issue is to recognize the potential lack of consistency in definition or methods among agencies or governments. This paper provides a few illustrative examples that reinforce this point and demonstrates how a general knowledge of statistical measures can lead to better informed reference service.  相似文献   

A major challenge facing academic libraries is the need for reference librarians to become knowledge experts in their assigned subject areas. The subject-specialist approach increases the effectiveness of collection development, classroom instruction, and faculty liaison interactions. Simultaneously, this approach creates the need for continuous learning opportunities. Conferences organized around academic disciplines provide a direct connection to subject-specific information as well as opportunities for meeting people who share common interests. With the increase in interdisciplinary and multidisciplinary approaches to teaching and research, the authors argue that attending subject-specific conferences is the best way to keep up with information needs in various fields. This article reviews the benefits of attending academic conferences and discusses five strategies for selecting an appropriate subject conference in any discipline. First-person accounts of conference experiences illustrate these benefits.  相似文献   

网络资源的自动获取识别归档技术研究   总被引:3,自引:0,他引:3  
论述了研究网络资源的自动获取识别归档技术的必要性,自动获取识别归档网络资源的思路及实现过程,实现过程中可能遇到的问题及解决办法等。最后还讨论了国内高校图书馆在收集整理网络资源建立网络资源导航库方面存在的问题。  相似文献   

This study identifies the essential chat reference competencies to enhance the professional preparation of reference personnel. A survey was conducted to examine practitioners' perceptions of chat reference competencies reported in the literature. A prioritized competency list was produced based on the survey results. The investigated competencies could be divided into four categories: media-independent core reference competencies, reference competencies highlighted in the context of chat reference, reference competencies specific to chat reference, and reference competencies not as important in chat reference. Competencies in the first three categories received ratings higher than 5.5 (out of 7) and can be defined as the essential competencies requisite for chat reference practice. Findings from this study can be used as the basis to design and implement training and education programs to enhance the professional preparation of chat reference librarians.  相似文献   

On December 7, 2011, the Book Industry Study Group released BISG Policy Statement POL-1101, Best Practices for Identifying Digital Products, capping an 18-month project by a working group of the BISG Identification Committee. This article chronicles how a diverse group of industry professionals, from every sector of the supply chain, came together to deal with an issue that had polarized the industry since 2005. Based on a presentation made by the author at the 2011 Annual Meeting of the Book Industry Study Group, this article delves further into the history and the methodology behind the drafting of the Policy Statement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号