共查询到20条相似文献,搜索用时 10 毫秒
1.
Re-ranking the search results in order to promote novel ones has traditionally been regarded as an intuitive diversification strategy. In this paper, we challenge this common intuition and thoroughly investigate the actual role of novelty for search result diversification, based upon the framework provided by the diversity task of the TREC 2009 and 2010 Web tracks. Our results show that existing diversification approaches based solely on novelty cannot consistently improve over a standard, non-diversified baseline ranking. Moreover, when deployed as an additional component by the current state-of-the-art diversification approaches, our results show that novelty does not bring significant improvements, while adding considerable efficiency overheads. Finally, through a comprehensive analysis with simulated rankings of various quality, we demonstrate that, although inherently limited by the performance of the initial ranking, novelty plays a role at breaking the tie between similarly diverse results. 相似文献
2.
Search result diversification aims to diversify search results to cover different query subtopics, i.e., pieces of relevant information. The state of the art diversification methods often explicitly model the diversity based on query subtopics, and their performance is closely related to the quality of subtopics. Most existing studies extracted query subtopics only from the unstructured data such as document collections. However, there exists a huge amount of information from structured data, which complements the information from the unstructured data. The structured data can provide valuable information about domain knowledge, but is currently under-utilized. In this article, we study how to leverage the integrated information from both structured and unstructured data to extract high quality subtopics for search result diversification. We first discuss how to extract subtopics from structured data. We then propose three methods to integrate structured and unstructured data. Specifically, the first method uses the structured data to guide the subtopic extraction from unstructured data, the second one uses the unstructured data to guide the extraction, and the last one first extracts the subtopics separately from two data sources and then combines those subtopics. Experimental results in both Enterprise and Web search domains show that the proposed methods are effective in extracting high quality subtopics from the integrated information, which can lead to better diversification performance. 相似文献
3.
4.
Olivier Chapelle Shihao Ji Ciya Liao Emre Velipasaoglu Larry Lai Su-Lin Wu 《Information Retrieval》2011,14(6):572-592
We study the problem of web search result diversification in the case where intent based relevance scores are available. A
diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this
context, we first analyze the properties of an intent-based metric, ERR-IA, to measure relevance and diversity altogether.
We argue that this is a better metric than some previously proposed intent aware metrics and show that it has a better correlation
with abandonment rate. We then propose an algorithm to rerank web search results based on optimizing an objective function
corresponding to this metric and evaluate it on shopping related queries. 相似文献
5.
《Communication monographs》2012,79(4):308-312
Argument diagrams, especially Stephen Toulmin's “layout,” have enjoyed wide popularity, but the theoretical assumptions behind the use of such diagrams are very suspect. Diagrams are linguistically biased; they abstract arguments out of social contexts; and it is impossible to clearly define and delimit the phenomena they represent. The disadvantages of their use outweigh any conceptual advantages they might provide, and the rhetorical critic and argumentation theorist are enjoined to eschew their use. 相似文献
6.
博物馆建筑的类别及设计差异 总被引:1,自引:1,他引:0
博物馆的类型繁多,由此而形成的博物馆建筑的差别也往往很大。为使博物馆建筑设计能符合所建博物馆的特定需要,有必要对博物馆建筑加以分类,并分门别类地分析研究它们各自的特点与相互的差别,进而把握其建筑设计的针对性。 相似文献
7.
Product reviews have become an important resource for customers before they make purchase decisions. However, the abundance of reviews makes it difficult for customers to digest them and make informed choices. In our study, we aim to help customers who want to quickly capture the main idea of a lengthy product review before they read the details. In contrast with existing work on review analysis and document summarization, we aim to retrieve a set of real-world user questions to summarize a review. In this way, users would know what questions a given review can address and they may further read the review only if they have similar questions about the product. Specifically, we design a two-stage approach which consists of question selection and question diversification. For question selection phase, we first employ probabilistic retrieval models to locate candidate questions that are relevant to a given review. A Recurrent Neural Network Encoder–Decoder is utilized to measure the “answerability” of questions to a review. We then design a set function to re-rank the questions with the goal of rewarding diversity in the final question set. The set function satisfies submodularity and monotonicity, which results in an efficient greedy algorithm of submodular optimization. Evaluation on product reviews from two categories shows that the proposed approach is effective for discovering meaningful questions that are representative of individual reviews. 相似文献
8.
9.
We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every
keystroke display those completions of the last query word that would lead to the best hits, and also display the best such
hits. The following problem is at the core of this feature: for a fixed document collection, given a set D of documents, and an alphabetical range W of words, compute the set of all word-in-document pairs (w, d) from the collection such that w ∈ W and d ∈ D. We present a new data structure with the help of which such autocompletion queries can be processed, on the average, in
time linear in the input plus output size, independent of the size of the underlying document collection. At the same time,
our data structure uses no more space than an inverted index. Actual query processing times on a large test collection correlate
almost perfectly with our theoretical bound.
相似文献
Ingmar WeberEmail: |
10.
11.
12.
Google学术搜索引擎与跨库检索系统的功能对比 总被引:1,自引:0,他引:1
文章介绍了两种数字资源整合利用的方法--Google中文学术搜索引擎和Cross-Search跨库检索系统,并将它们各自的功能进行了对比. 相似文献
13.
João Palotti Allan Hanbury Henning Müller Charles E. KahnJr. 《Information Retrieval》2016,19(1-2):189-224
The internet is an important source of medical knowledge for everyone, from laypeople to medical professionals. We investigate how these two extremes, in terms of user groups, have distinct needs and exhibit significantly different search behaviour. We make use of query logs in order to study various aspects of these two kinds of users. The logs from America Online, Health on the Net, Turning Research Into Practice and American Roentgen Ray Society (ARRS) GoldMiner were divided into three sets: (1) laypeople, (2) medical professionals (such as physicians or nurses) searching for health content and (3) users not seeking health advice. Several analyses are made focusing on discovering how users search and what they are most interested in. One possible outcome of our analysis is a classifier to infer user expertise, which was built. We show the results and analyse the feature set used to infer expertise. We conclude that medical experts are more persistent, interacting more with the search engine. Also, our study reveals that, conversely to what is stated in much of the literature, the main focus of users, both laypeople and professionals, is on disease rather than symptoms. The results of this article, especially through the classifier built, could be used to detect specific user groups and then adapt search results to the user group. 相似文献
14.
Hug GP 《Medical reference services quarterly》2001,20(4):39-46
This paper describes an ongoing improvement effort directed at increasing the quality of mediated searches at the Sladen Library and Center for Health Information Resources. The project is the result of an analysis of literature statistics for mediated searching for 1997. The improvement project utilizes Deming's Plan-Do-Check-Act or PDCA cycle. Henry Ford Health System encourages use of the PDCA methodology for improvement projects. A key component of this improvement effort was the introduction of a productivity standard that each searcher is required to meet. The library has global productivity goals, but this is the first time that individual searchers have been held to a quantitative performance standard. The outcome of the Literature Search Improvement Project has been favorable. 相似文献
15.
16.
17.
18.
Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated. In this paper we extend the study of selective search into new areas using a fine-grained simulation, examining the difference in efficiency when term-based and sample-based resource selection algorithms are used; measuring the effect of two policies for assigning index shards to machines; and exploring the benefits of index-spreading and mirroring as the number of deployed machines is varied. Results obtained for two large datasets and four large query logs confirm that selective search is significantly more efficient than conventional distributed search architectures and can handle higher query rates. Furthermore, we demonstrate that selective search can be tuned to avoid bottlenecks, and thus maximize usage of the underlying computer hardware. 相似文献
19.
20.
科技学术期刊审稿多元化探析 总被引:2,自引:0,他引:2
"三审制"传统的机制及运作的惯例,已经无法适应时代发展的要求,在原先严肃但又有点呆板的审稿机制中需要注入多元化的审稿方式,编者和审者在坚持"三审制"严肃性一面的同时应多注意运作方式等的灵活性. 相似文献