首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Re-ranking the search results in order to promote novel ones has traditionally been regarded as an intuitive diversification strategy. In this paper, we challenge this common intuition and thoroughly investigate the actual role of novelty for search result diversification, based upon the framework provided by the diversity task of the TREC 2009 and 2010 Web tracks. Our results show that existing diversification approaches based solely on novelty cannot consistently improve over a standard, non-diversified baseline ranking. Moreover, when deployed as an additional component by the current state-of-the-art diversification approaches, our results show that novelty does not bring significant improvements, while adding considerable efficiency overheads. Finally, through a comprehensive analysis with simulated rankings of various quality, we demonstrate that, although inherently limited by the performance of the initial ranking, novelty plays a role at breaking the tie between similarly diverse results.  相似文献   

2.
Search result diversification aims to diversify search results to cover different query subtopics, i.e., pieces of relevant information. The state of the art diversification methods often explicitly model the diversity based on query subtopics, and their performance is closely related to the quality of subtopics. Most existing studies extracted query subtopics only from the unstructured data such as document collections. However, there exists a huge amount of information from structured data, which complements the information from the unstructured data. The structured data can provide valuable information about domain knowledge, but is currently under-utilized. In this article, we study how to leverage the integrated information from both structured and unstructured data to extract high quality subtopics for search result diversification. We first discuss how to extract subtopics from structured data. We then propose three methods to integrate structured and unstructured data. Specifically, the first method uses the structured data to guide the subtopic extraction from unstructured data, the second one uses the unstructured data to guide the extraction, and the last one first extracts the subtopics separately from two data sources and then combines those subtopics. Experimental results in both Enterprise and Web search domains show that the proposed methods are effective in extracting high quality subtopics from the integrated information, which can lead to better diversification performance.  相似文献   

3.
4.
We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, we first analyze the properties of an intent-based metric, ERR-IA, to measure relevance and diversity altogether. We argue that this is a better metric than some previously proposed intent aware metrics and show that it has a better correlation with abandonment rate. We then propose an algorithm to rerank web search results based on optimizing an objective function corresponding to this metric and evaluate it on shopping related queries.  相似文献   

5.
《Communication monographs》2012,79(4):308-312

Argument diagrams, especially Stephen Toulmin's “layout,” have enjoyed wide popularity, but the theoretical assumptions behind the use of such diagrams are very suspect. Diagrams are linguistically biased; they abstract arguments out of social contexts; and it is impossible to clearly define and delimit the phenomena they represent. The disadvantages of their use outweigh any conceptual advantages they might provide, and the rhetorical critic and argumentation theorist are enjoined to eschew their use.  相似文献   

6.
博物馆建筑的类别及设计差异   总被引:1,自引:1,他引:0  
博物馆的类型繁多,由此而形成的博物馆建筑的差别也往往很大。为使博物馆建筑设计能符合所建博物馆的特定需要,有必要对博物馆建筑加以分类,并分门别类地分析研究它们各自的特点与相互的差别,进而把握其建筑设计的针对性。  相似文献   

7.
Product reviews have become an important resource for customers before they make purchase decisions. However, the abundance of reviews makes it difficult for customers to digest them and make informed choices. In our study, we aim to help customers who want to quickly capture the main idea of a lengthy product review before they read the details. In contrast with existing work on review analysis and document summarization, we aim to retrieve a set of real-world user questions to summarize a review. In this way, users would know what questions a given review can address and they may further read the review only if they have similar questions about the product. Specifically, we design a two-stage approach which consists of question selection and question diversification. For question selection phase, we first employ probabilistic retrieval models to locate candidate questions that are relevant to a given review. A Recurrent Neural Network Encoder–Decoder is utilized to measure the “answerability” of questions to a review. We then design a set function to re-rank the questions with the goal of rewarding diversity in the final question set. The set function satisfies submodularity and monotonicity, which results in an efficient greedy algorithm of submodular optimization. Evaluation on product reviews from two categories shows that the proposed approach is effective for discovering meaningful questions that are representative of individual reviews.  相似文献   

8.
9.
We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every keystroke display those completions of the last query word that would lead to the best hits, and also display the best such hits. The following problem is at the core of this feature: for a fixed document collection, given a set D of documents, and an alphabetical range W of words, compute the set of all word-in-document pairs (w, d) from the collection such that w W and d ∈ D. We present a new data structure with the help of which such autocompletion queries can be processed, on the average, in time linear in the input plus output size, independent of the size of the underlying document collection. At the same time, our data structure uses no more space than an inverted index. Actual query processing times on a large test collection correlate almost perfectly with our theoretical bound.
Ingmar WeberEmail:
  相似文献   

10.
11.
网络信息资源检索策略与方法   总被引:6,自引:1,他引:6  
国内外众多网站都开辟了有关于检索策略与方法的专栏,本汇集一些具有代表性的网站,以使网络用户系统地学习到网络信息检索方法和技巧。  相似文献   

12.
Google学术搜索引擎与跨库检索系统的功能对比   总被引:1,自引:0,他引:1  
徐芳 《图书馆学研究》2008,(2):72-73,95
文章介绍了两种数字资源整合利用的方法--Google中文学术搜索引擎和Cross-Search跨库检索系统,并将它们各自的功能进行了对比.  相似文献   

13.
The internet is an important source of medical knowledge for everyone, from laypeople to medical professionals. We investigate how these two extremes, in terms of user groups, have distinct needs and exhibit significantly different search behaviour. We make use of query logs in order to study various aspects of these two kinds of users. The logs from America Online, Health on the Net, Turning Research Into Practice and American Roentgen Ray Society (ARRS) GoldMiner were divided into three sets: (1) laypeople, (2) medical professionals (such as physicians or nurses) searching for health content and (3) users not seeking health advice. Several analyses are made focusing on discovering how users search and what they are most interested in. One possible outcome of our analysis is a classifier to infer user expertise, which was built. We show the results and analyse the feature set used to infer expertise. We conclude that medical experts are more persistent, interacting more with the search engine. Also, our study reveals that, conversely to what is stated in much of the literature, the main focus of users, both laypeople and professionals, is on disease rather than symptoms. The results of this article, especially through the classifier built, could be used to detect specific user groups and then adapt search results to the user group.  相似文献   

14.
This paper describes an ongoing improvement effort directed at increasing the quality of mediated searches at the Sladen Library and Center for Health Information Resources. The project is the result of an analysis of literature statistics for mediated searching for 1997. The improvement project utilizes Deming's Plan-Do-Check-Act or PDCA cycle. Henry Ford Health System encourages use of the PDCA methodology for improvement projects. A key component of this improvement effort was the introduction of a productivity standard that each searcher is required to meet. The library has global productivity goals, but this is the first time that individual searchers have been held to a quantitative performance standard. The outcome of the Literature Search Improvement Project has been favorable.  相似文献   

15.
搜索引擎的性能评价   总被引:7,自引:0,他引:7  
在信息飞速增长的网络环境下,对有用信息的查找变得越来越困难,搜索引擎便应运而生,并逐渐发展壮大。论文以用户为导向构建层次分析模型,借此对搜索引擎的评价作简要探讨。  相似文献   

16.
语义检索研究综述   总被引:10,自引:0,他引:10  
随着语义网技术的发展,语义检索的研究已成为热点领域。本文首先分析了传统网络检索技术的局限性。然后在对若干研究与应用调研的基础上,对当前语义检索的研究进行了综述,详细分析了目前的两类语义检索研究:语义支持的检索及语义网检索,对语义网检索更进一步分析了其三种不同的研究方向:语义网文档检索、实例检索、关系检索。  相似文献   

17.
18.
Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated. In this paper we extend the study of selective search into new areas using a fine-grained simulation, examining the difference in efficiency when term-based and sample-based resource selection algorithms are used; measuring the effect of two policies for assigning index shards to machines; and exploring the benefits of index-spreading and mirroring as the number of deployed machines is varied. Results obtained for two large datasets and four large query logs confirm that selective search is significantly more efficient than conventional distributed search architectures and can handle higher query rates. Furthermore, we demonstrate that selective search can be tuned to avoid bottlenecks, and thus maximize usage of the underlying computer hardware.  相似文献   

19.
统计学结果的修约   总被引:2,自引:0,他引:2  
郝拉娣  于化东 《编辑学报》2005,17(4):255-255
用统计学结果("平均数±标准差""平均数±标准误")表达带有随机误差的实验结果,是科技论文写作的一大进步,统计学结果的表达不仅要准确,而且其数值修约更不能忽视.  相似文献   

20.
科技学术期刊审稿多元化探析   总被引:2,自引:0,他引:2  
卢正升 《编辑学报》2007,19(5):321-323
"三审制"传统的机制及运作的惯例,已经无法适应时代发展的要求,在原先严肃但又有点呆板的审稿机制中需要注入多元化的审稿方式,编者和审者在坚持"三审制"严肃性一面的同时应多注意运作方式等的灵活性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号