首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and using information induced from these (often called) query-specific clusters for re-ranking the list. However, results concerning the effectiveness of various automatic cluster-based re-ranking methods have been inconclusive. We show that using query-specific clusters for automatic re-ranking of top-retrieved documents is effective with several methods in which clusters play different roles, among which is the smoothing of document language models. We do so by adapting previously-proposed cluster-based retrieval approaches, which are based on (static) query-independent clusters for ranking all documents in a corpus, to the re-ranking setting wherein clusters are query-specific. The best performing method that we develop outperforms both the initial document-based ranking and some previously proposed cluster-based re-ranking approaches; furthermore, this algorithm consistently outperforms a state-of-the-art pseudo-feedback-based approach. In further exploration we study the performance of cluster-based smoothing methods for re-ranking with various (soft and hard) clustering algorithms, and demonstrate the importance of clusters in providing context from the initial list through a comparison to using single documents to this end.
Oren KurlandEmail:
  相似文献   

2.
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet?allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts.  相似文献   

3.
While past research has shown that learning outcomes can be influenced by the amount of effort students invest during the learning process, there has been little research into this question for scenarios where people use search engines to learn. In fact, learning-related tasks represent a significant fraction of the time users spend using Web search, so methods for evaluating and optimizing search engines to maximize learning are likely to have broad impact. Thus, we introduce and evaluate a retrieval algorithm designed to maximize educational utility for a vocabulary learning task, in which users learn a set of important keywords for a given topic by reading representative documents on diverse aspects of the topic. Using a crowdsourced pilot study, we compare the learning outcomes of users across four conditions corresponding to rankings that optimize for different levels of keyword density. We find that adding keyword density to the retrieval objective gave significant learning gains on some topics, with higher levels of keyword density generally corresponding to more time spent reading per word, and stronger learning gains per word read. We conclude that our approach to optimizing search ranking for educational utility leads to retrieved document sets that ultimately may result in more efficient learning of important concepts.  相似文献   

4.
User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users’ various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is α-nDCG@5 0.307, IA-P@5 0.121, and α#-nDCG@5 0.214 on the TREC09, as well as α-nDCG@10 0.421, IA-P@10 0.201, and α#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users’ search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics.  相似文献   

5.
Online communities are valuable information sources where knowledge is accumulated by interactions between people. Search services provided by online community sites such as forums are often, however, quite poor. To address this, we investigate retrieval techniques that exploit the hierarchical thread structures in community sites. Since these structures are sometimes not explicit or accurately annotated, we introduce structure discovery techniques that use a variety of features to model relations between posts. We then make use of thread structures in retrieval experiments with two online forums and one email archive. Our results show that using thread structures that have been accurately annotated can lead to significant improvements in retrieval performance compared to strong baselines.  相似文献   

6.
The Nursing and Allied Health Resources Section (NAHRS) of the Medical Library Association created the 2012 NAHRS Selected List of Nursing Journals to assist librarians with collection development and to provide nurses and librarians with data on nursing and interdisciplinary journals to assist their decisions about where to submit articles for publication. This list is a continuation and expansion of a list initially known as the Key Nursing Journals list. It compares database coverage and full-text options for each title and includes an analysis of the number of evidence-based, research, and continuing education articles.  相似文献   

7.
Bing and Google customize their results to target people with different geographic locations and languages but, despite the importance of search engines for web users and webometric research, the extent and nature of these differences are unknown. This study compares the results of seventeen random queries submitted automatically to Bing for thirteen different English geographic search markets at monthly intervals. Search market choice alters a small majority of the top 10 results but less than a third of the complete sets of results. Variation in the top 10 results over a month was about the same as variation between search markets but variation over time was greater for the complete results sets. Most worryingly for users, there were almost no ubiquitous authoritative results: only one URL was always returned in the top 10 for all search markets and points in time, and Wikipedia was almost completely absent from the most common top 10 results. Most importantly for webometrics, results from at least three different search markets should be combined to give more reliable and comprehensive results, even for queries that return fewer than the maximum number of URLs.  相似文献   

8.
We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, we first analyze the properties of an intent-based metric, ERR-IA, to measure relevance and diversity altogether. We argue that this is a better metric than some previously proposed intent aware metrics and show that it has a better correlation with abandonment rate. We then propose an algorithm to rerank web search results based on optimizing an objective function corresponding to this metric and evaluate it on shopping related queries.  相似文献   

9.
Background:Systematic reviews are comprehensive, robust, inclusive, transparent, and reproducible when bringing together the evidence to answer a research question. Various guidelines provide recommendations on the expertise required to conduct a systematic review, where and how to search for literature, and what should be reported in the published review. However, the finer details of the search results are not typically reported to allow the search methods or search efficiency to be evaluated.Case Presentation:This case study presents a search summary table, containing the details of which databases were searched, which supplementary search methods were used, and where the included articles were found. It was developed and published alongside a recent systematic review. This simple format can be used in future systematic reviews to improve search results reporting.Conclusions:Publishing a search summary table in all systematic reviews would add to the growing evidence base about information retrieval, which would help in determining which databases to search for which type of review (in terms of either topic or scope), what supplementary search methods are most effective, what type of literature is being included, and where it is found. It would also provide evidence for future searching and search methods research.  相似文献   

10.
搜索引擎的选择与使用技巧   总被引:3,自引:0,他引:3  
本文讨论了搜索引擎的分类及特点,结合检索实例,总结了搜索引擎的选择和使用技巧。  相似文献   

11.
本文从目前搜索引擎中使用分类法的现状入手,针对图书情报界提出的优化搜索引擎中分类体系的改造方案,提出了一些有用的建议。  相似文献   

12.
Social tagging systems have gained increasing popularity as a method of annotating and categorizing a wide range of different web resources. Web search that utilizes social tagging data suffers from an extreme example of the vocabulary mismatch problem encountered in traditional information retrieval (IR). This is due to the personalized, unrestricted vocabulary that users choose to describe and tag each resource. Previous research has proposed the utilization of query expansion to deal with search in this rather complicated space. However, non-personalized approaches based on relevance feedback and personalized approaches based on co-occurrence statistics only showed limited improvements. This paper proposes a novel query expansion framework based on individual user profiles mined from the annotations and resources the user has marked. The underlying theory is to regularize the smoothness of word associations over a connected graph using a regularizer function on terms extracted from top-ranked documents. The intuition behind the model is the prior assumption of term consistency: the most appropriate expansion terms for a query are likely to be associated with, and influenced by terms extracted from the documents ranked highly for the initial query. The framework also simultaneously incorporates annotations and web documents through a Tag-Topic model in a latent graph. The experimental results suggest that the proposed personalized query expansion method can produce better results than both the classical non-personalized search approach and other personalized query expansion methods. Hence, the proposed approach significantly benefits personalized web search by leveraging users’ social media data.  相似文献   

13.
14.

Objective:

This paper presents the methods and results of a study designed to produce the third edition of the “Basic List of Veterinary Medical Serials,” which was established by the Veterinary Medical Libraries Section in 1976 and last updated in 1986.

Methods:

A set of 238 titles were evaluated using a decision matrix in order to systematically assign points for both objective and subjective criteria and determine an overall score for each journal. Criteria included: coverage in four major indexes, scholarly impact rank as tracked in two sources, identification as a recommended journal in preparing for specialty board examinations, and a veterinary librarian survey rating.

Results:

Of the 238 titles considered, a minimum scoring threshold determined the 123 (52%) journals that constituted the final list. The 36 subject categories represented on the list include general and specialty disciplines in veterinary medicine. A ranked list of journals and a list by subject category were produced.

Conclusion:

Serials appearing on the third edition of the “Basic List of Veterinary Medical Serials” met expanded objective measures of quality and impact as well as subjective perceptions of value by both librarians and veterinary practitioners.

Highlights

  • The 123 journals on the “Basic List of Veterinary Medical Serials” include 117 journals with a decision matrix score of 15 points or higher, with an additional 6 journals included for more complete subject representation.
  • Subject categories with the greatest number of journals are internal medicine, food animal medicine, and research.
  • Updates for the third edition of the “Basic List” include 59 new titles and 13 new subject categories.

Implications

  • The third edition of the “Basic List” provides a useful collection development and assessment tool for veterinary libraries, as well as general libraries with a need to develop a core collection of veterinary resources.
  • The decision matrix approach, using standard quantitative and focused qualitative measures, provides a useful methodology for creating core lists in other disciplines.
  相似文献   

15.
Most search engines display some document metadata, such as title, snippet and URL, in conjunction with the returned hits to aid users in determining documents. However, metadata is usually fragmented pieces of information that, even when combined, does not provide an overview of a returned document. In this paper, we propose a mechanism of enriching metadata of the returned results by incorporating automatically extracted document keyphrases with each returned hit. We hypothesize that keyphrases of a document can better represent the major theme in that document. Therefore, by examining the keyphrases in each returned hit, users can better predict the content of documents and the time spent on downloading and examining the irrelevant documents will be reduced substantially.  相似文献   

16.
Understanding of mathematics is needed to underpin the process of search, either explicitly with Exact Match (Boolean logic, adjacency) or implicitly with Best match natural language search. In this paper we outline some pedagogical challenges in teaching mathematics for information retrieval (IR) to postgraduate information science students. The aim is to take these challenges either found by experience or in the literature, to identify both theoretical and practical ideas in order to improve the delivery of the material and positively affect the learning of the target audience by using a tutorial style of teaching. Results show that there is evidence to support the notion that a more pro-active style of teaching using tutorials yield benefits both in terms of assessment results and student satisfaction.
Andrew MacFarlaneEmail:
  相似文献   

17.
Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target languages in response to a user query in a single source language. In a multilingual federated search environment, different information sources contain documents in different languages. A general search strategy in multilingual federated search environments is to translate the user query to each language of the information sources and run a monolingual search in each information source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information sources that are in different languages. This is known as the results merging problem for multilingual information retrieval. Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the other side, a more effective merging method was proposed to download and translate all retrieved documents into the source language and generate the final ranked list by running a monolingual search in the search client. The latter method is more effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing both the query-based translation method and the document-based translation method. Then, query-specific and source-specific transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) () data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results merging algorithm with different transformation models. This paper also provides thorough experimental results as well as detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results of the cross-language evaluation forum-CLEF 2005, 2005).
Hao YuanEmail:
  相似文献   

18.
Main path analysis is a popular method for extracting the backbone of scientific evolution from a (paper) citation network. The first and core step of main path analysis, called search path counting, is to weight citation arcs by the number of scientific influence paths from old to new papers. Search path counting shows high potential in scientific impact evaluation due to its semantic similarity to the meaning of scientific impact indicator, i.e. how many papers are influenced to what extent. In addition, the algorithmic idea of search path counting also resembles many known indirect citation impact indicators. Inspired by the above observations, this paper presents the FSPC (Forward Search Path Count) framework as an alternative scientific impact indicator based on indirect citations. Two critical assumptions are made to ensure the effectiveness of FSPC. First, knowledge decay is introduced to weight scientific influence paths in decreasing order of length. Second, path capping is introduced to mimic human literature search and citing behavior. By experiments on two well-studied datasets against two carefully created gold standard sets of papers, we have demonstrated that FSPC is able to achieve surprisingly good performance in not only recognizing high-impact papers but also identifying undercited papers.  相似文献   

19.
This paper describes the experience of an academic health sciences library which made BRS/After Dark, an end user search service, available to its clientele as a complement to its mediated search service. The library environment, initial publicity efforts and administrative procedures are discussed. An attempt is made to evaluate the usefulness of the service in a library environment. This evaluation, covering a three-month period, is based on the observations of the librarians administering the service and an exit interview conducted with searchers over a two-month period. User satisfaction, both observed and recorded, was quite high; however, the librarians found that users had more difficulty in constructing appropriate search strategies than had been anticipated. Overall, the service is assessed as highly useful.  相似文献   

20.
Cluster-based and passage-based document retrieval paradigms were shown to be effective. While the former are based on utilizing query-related corpus context manifested in clusters of similar documents, the latter address the fact that a document can be relevant even if only a very small part of it contains query-pertaining information. Hence, cluster-based approaches could be viewed as based on “expanding” the document representation, while passage-based approaches can be thought of as utilizing a “contracted” document representation. We present a study of the relative benefits of using each of these two approaches, and of the potential merits of their integration. To that end, we devise two methods that integrate whole-document-based, cluster-based and passage-based information. The methods are applied for the re-ranking task, that is, re-ordering documents in an initially retrieved list so as to improve precision at the very top ranks. Extensive empirical evaluation attests to the potential merits of integrating these information types. Specifically, the resultant performance substantially transcends that of the initial ranking; and, is often better than that of a state-of-the-art pseudo-feedback-based query expansion approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号