首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study examines the use of an ontology as a search tool. Sixteen subjects created queries using Concept-based Information Retrieval Interface (CIRI) and a regular baseline IR interface. The simulated work task method was used to make the searching situations realistic. Subjects’ search experiences, queries and search results were examined. The numbers of search concepts and keys, as well as their overlap in the queries were investigated. The effectiveness of the CIRI and baseline queries was compared. An Ontology Index (OI) was calculated for all search tasks and the correlation between the OI and the overlap of search concepts and keys in queries was investigated. The number of search keys and concepts was higher in CIRI queries than in baseline interface queries. Also the overlap of search keys was higher among CIRI users than among baseline users. These both findings are due to CIRI’s expansion feature. There was no clear correlation between OI and overlap of search concepts and keys. The search results were evaluated with generalised precision and recall, and relevance scores based on individual relevance assessments. The baseline interface queries performed better in all comparisons, but the difference was statistically significant only in relevance scores based on individual relevance assessments.  相似文献   

2.
The most common approach to measuring the effectiveness of Information Retrieval systems is by using test collections. The Contextual Suggestion (CS) TREC track provides an evaluation framework for systems that recommend items to users given their geographical context. The specific nature of this track allows the participating teams to identify candidate documents either from the Open Web or from the ClueWeb12 collection, a static version of the web. In the judging pool, the documents from the Open Web and ClueWeb12 collection are distinguished. Hence, each system submission should be based only on one resource, either Open Web (identified by URLs) or ClueWeb12 (identified by ids). To achieve reproducibility, ranking web pages from ClueWeb12 should be the preferred method for scientific evaluation of CS systems, but it has been found that the systems that build their suggestion algorithms on top of input taken from the Open Web achieve consistently a higher effectiveness. Because most of the systems take a rather similar approach to making CSs, this raises the question whether systems built by researchers on top of ClueWeb12 are still representative of those that would work directly on industry-strength web search engines. Do we need to sacrifice reproducibility for the sake of representativeness? We study the difference in effectiveness between Open Web systems and ClueWeb12 systems through analyzing the relevance assessments of documents identified from both the Open Web and ClueWeb12. Then, we identify documents that overlap between the relevance assessments of the Open Web and ClueWeb12, observing a dependency between relevance assessments and the source of the document being taken from the Open Web or from ClueWeb12. After that, we identify documents from the relevance assessments of the Open Web which exist in the ClueWeb12 collection but do not exist in the ClueWeb12 relevance assessments. We use these documents to expand the ClueWeb12 relevance assessments. Our main findings are twofold. First, our empirical analysis of the relevance assessments of 2  years of CS track shows that Open Web documents receive better ratings than ClueWeb12 documents, especially if we look at the documents in the overlap. Second, our approach for selecting candidate documents from ClueWeb12 collection based on information obtained from the Open Web makes an improvement step towards partially bridging the gap in effectiveness between Open Web and ClueWeb12 systems, while at the same time we achieve reproducible results on well-known representative sample of the web.  相似文献   

3.
Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this paper, we present evaluation results for dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. First, monolingual baseline queries were automatically formed from the topics. Secondly, source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish), using structured target queries. The effectiveness of the translated queries was compared to that of the monolingual queries. Thirdly, pseudo-relevance feedback was used to expand the original target queries. CLIR performance was evaluated using three relevance thresholds: stringent, regular, and liberal. When regular or liberal threshold was used, a reasonable performance was achieved. Using stringent threshold, equally high performance could not be achieved. On all the relevance thresholds the performance of the translated queries was successfully raised by pseudo-relevance feedback based query expansion. However, the performance of the stringent threshold in relation to the other thresholds could not be raised by this method.  相似文献   

4.
Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice however, most evaluation scenarios only allow us to conclusively determine the relevance towards the particular assessor that provided the judgments. A factor that cannot be ignored when extending conclusions made from assessors towards users, is the possible disagreement on relevance, assuming that a single gold truth label does not exist. This paper presents and analyzes the predicted relevance model (PRM), which allows predicting a particular result’s relevance for a random user, based on an observed assessment and knowledge on the average disagreement between assessors. With the PRM, existing evaluation metrics designed to measure binary assessor relevance, can be transformed into more robust and effectively graded measures that evaluate relevance towards a random user. It also leads to a principled way of quantifying multiple graded or categorical relevance levels for use as gains in established graded relevance measures, such as normalized discounted cumulative gain, which nowadays often use heuristic and data-independent gain values. Given a set of test topics with graded relevance judgments, the PRM allows evaluating systems on different scenarios, such as their capability of retrieving top results, or how well they are able to filter out non-relevant ones. Its use in actual evaluation scenarios is illustrated on several information retrieval test collections.  相似文献   

5.
BACKGROUND: Cochrane-style systematic reviews increasingly require the participation of librarians. Guidelines on the appropriate search strategy to use for systematic reviews have been proposed. However, research evidence supporting these recommendations is limited. OBJECTIVE: This study investigates the effectiveness of various systematic search methods used to uncover randomized controlled trials (RCTs) for systematic reviews. Effectiveness is defined as the proportion of relevant material uncovered for the systematic review using extended systematic review search methods. The following extended systematic search methods are evaluated: searching subject-specific or specialized databases (including trial registries), hand searching, scanning reference lists, and communicating personally. METHODS: Two systematic review projects were prospectively monitored regarding the method used to identify items as well as the type of items retrieved. The proportion of RCTs identified by each systematic search method was calculated. RESULTS: The extended systematic search methods uncovered 29.2% of all items retrieved for the systematic reviews. The search of specialized databases was the most effective method, followed by scanning of reference lists, communicating personally, and hand searching. Although the number of items identified through hand searching was small, these unique items would otherwise have been missed. CONCLUSIONS: Extended systematic search methods are effective tools for uncovering material for the systematic review. The quality of the items uncovered has yet to be assessed and will be key in evaluating the value of the systematic search methods.  相似文献   

6.
Search engines are increasingly going beyond the pure relevance of search results to entertain users with information items that are interesting and even surprising, albeit sometimes not fully related to their search intent. In this paper, we study this serendipitous search space in the context of entity search, which has recently emerged as a powerful paradigm for building semantically rich answers. Specifically, our work proposes to enhance an explorative search system that represents a large sample of Yahoo Answers as an entity network, with a result structuring that goes beyond ranked lists, using composite entity retrieval, which requires a bundling of the results. We propose and compare six bundling methods, which exploit topical categories, entity specializations, and sentiment, and go beyond simple entity clustering. Two large-scale crowd-sourced studies show that users find a bundled organization—especially based on the topical categories of the query entity—to be better at revealing the most useful results, as well as at organizing the results, helping to discover novel and interesting information, and promoting exploration. Finally, a third study of 30 simulated search tasks reveals the bundled search experience to be less frustrating and more rewarding, with more users willing to recommend it to others.  相似文献   

7.
We have conducted a study to: (1) verify the exhaustiveness of pooling for the purpose of constructing a large-scale test collection, and (2) examine whether a difference in the number of pool documents can affect the relative evaluation of IR systems. We carried out the experiments using search topics, their relevance assessments, and the search results that were submitted for both the pre-test and test of the first NTCIR Workshop.Our results verified the efficiency and the effectiveness of the pooling method, the exhaustiveness of the relevance assessments, and the reliability of the evaluation using the test collection based on the pooling method.  相似文献   

8.
This scoping review highlights the characteristics, assessments and technologies used to describe, improve and promote one-to-one research consultations as a mode of research support and instruction in academic libraries. A search for relevant studies was conducted using LISTA, LISA, ERIC, Scopus and Web of Science, limiting to empirical evidence or studies outlining the use of technology within library practice, published from 2013 to current, in the English language. Supplemental search methods included a grey literature search, handsearching, and cited reference searching. From 2268 records, 43 studies were identified for inclusion. Of these, 17 studies described using consultations in the delivery of information literacy instruction, 33 evaluation studies reported on student, faculty, and librarian outcomes, and 15 articles discussed the use of technology. Users reported an overwhelming positive experience while mixed learning outcomes were seen in performance assessment studies. The assessment methods and uses of technology outlined in this review can be used by librarians to inform service delivery and provide evidence of the value and impact of research consultations on the academic mission of the library and its institution.  相似文献   

9.
Relevance feedback is an effective technique for improving search accuracy in interactive information retrieval. In this paper, we study an interesting optimization problem in interactive feedback that aims at optimizing the tradeoff between presenting search results with the highest immediate utility to a user (but not necessarily most useful for collecting feedback information) and presenting search results with the best potential for collecting useful feedback information (but not necessarily the most useful documents from a user’s perspective). Optimizing such an exploration–exploitation tradeoff is key to the optimization of the overall utility of relevance feedback to a user in the entire session of relevance feedback. We formally frame this tradeoff as a problem of optimizing the diversification of search results since relevance judgments on more diversified results have been shown to be more useful for relevance feedback. We propose a machine learning approach to adaptively optimizing the diversification of search results for each query so as to optimize the overall utility in an entire session. Experiment results on three representative retrieval test collections show that the proposed learning approach can effectively optimize the exploration–exploitation tradeoff and outperforms the traditional relevance feedback approach which only does exploitation without exploration.  相似文献   

10.
Ten students in a freshman Elementary Composition course were observed as they searched bibliographic databases on a CD-ROM LAN. All were preparing term papers, and were asked to think aloud as they conducted their searches. A total of 329 relevance judgments were made as the students searched an average of 2.7 databases per session. Basic familiarity with computers and a tendency to get out of unproductive searches helped in avoiding problems with the variety of databases and search interfaces. All students found records they chose to print, with relevance judgments often made from information in the controlled vocabulary, title, or abstract. The browse interface was used most often, and its similarity to InfoTrac was helpful. Some students were able to use keyword access effectively, though Wilsondisc's multiterm search required adjustments and adaptation of strategies. SilverPlatter's record display and print functions caused confusion for searchers unfamiliar with this interface.  相似文献   

11.
The influential Text REtrieval Conference (TREC) retrieval conference has always relied upon specialist assessors or occasionally participating groups to create relevance judgements for the tracks that it runs. Recently however, crowdsourcing has been championed as a cheap, fast and effective alternative to traditional TREC-like assessments. In 2010, TREC tracks experimented with crowdsourcing for the very first time. In this paper, we report our successful experience in creating relevance assessments for the TREC Blog track 2010 top news stories task using crowdsourcing. In particular, we crowdsourced both real-time newsworthiness assessments for news stories as well as traditional relevance assessments for blog posts. We conclude that crowdsourcing not only appears to be a feasible, but also cheap and fast means to generate relevance assessments. Furthermore, we detail our experiences running the crowdsourced evaluation of the TREC Blog track, discuss the lessons learned, and provide best practices.  相似文献   

12.
This study analyzes the relationship between online public access catalog (OPAC) searches entered in a small academic library's catalog and the circulation of items during the same time period. Rather than identifying all searches resulting in a reasonable number of retrievals as successful, searches in this study were determined most useful if items on the results list were subsequently borrowed from the library. This comparison of search results with subsequent material checkouts indicates which metadata elements seem most useful to searchers, and suggests ways libraries might use this knowledge to enhance their users’ search experiences.  相似文献   

13.
Web search queries are often ambiguous or faceted, and the task of identifying the major underlying senses and facets of queries has received much attention in recent years. We refer to this task as query subtopic mining. In this paper, we propose to use surrounding text of query terms in top retrieved documents to mine subtopics and rank them. We first extract text fragments containing query terms from different parts of documents. Then we group similar text fragments into clusters and generate a readable subtopic for each cluster. Based on the cluster and the language model trained from a query log, we calculate three features and combine them into a relevance score for each subtopic. Subtopics are finally ranked by balancing relevance and novelty. Our evaluation experiments with the NTCIR-9 INTENT Chinese Subtopic Mining test collection show that our method significantly outperforms a query log based method proposed by Radlinski et al. (2010) and a search result clustering based method proposed by Zeng et al. (2004) in terms of precision, I-rec, D-nDCG and D#-nDCG, the official evaluation metrics used at the NTCIR-9 INTENT task. Moreover, our generated subtopics are significantly more readable than those generated by the search result clustering method.  相似文献   

14.
15.
The PRISMA 2020 and PRISMA-S guidelines help systematic review teams report their reviews clearly, transparently, and with sufficient detail to enable reproducibility. PRISMA 2020, an updated version of the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) statement, is complemented by PRISMA-S, an extension to PRISMA focusing on reporting the search components of systematic reviews. Several significant changes were implemented in PRISMA 2020 and PRISMA-S when compared with the original version of PRISMA in 2009, including the recommendation to report search strategies for all databases, registries, and websites that were searched. PRISMA-S also recommends reporting the number of records identified from each information source. One of the most challenging aspects of the new guidance from both documents has been changes to the flow diagram. In this article, we review some of the common questions about using the PRISMA 2020 flow diagram and tracking records through the systematic review process.  相似文献   

16.
Background:Systematic reviews are comprehensive, robust, inclusive, transparent, and reproducible when bringing together the evidence to answer a research question. Various guidelines provide recommendations on the expertise required to conduct a systematic review, where and how to search for literature, and what should be reported in the published review. However, the finer details of the search results are not typically reported to allow the search methods or search efficiency to be evaluated.Case Presentation:This case study presents a search summary table, containing the details of which databases were searched, which supplementary search methods were used, and where the included articles were found. It was developed and published alongside a recent systematic review. This simple format can be used in future systematic reviews to improve search results reporting.Conclusions:Publishing a search summary table in all systematic reviews would add to the growing evidence base about information retrieval, which would help in determining which databases to search for which type of review (in terms of either topic or scope), what supplementary search methods are most effective, what type of literature is being included, and where it is found. It would also provide evidence for future searching and search methods research.  相似文献   

17.
The article summarizes the empirical results on relations between students' problem stages in the course of writing their research proposals for their master's theses and the information sought, choice of search terms and tactics and relevance assessments of the information found for that task. The study is based on Kuhlthau's model of the information search process. The results of the study show that there is a close connection between the students' problem stages (mental model) in the task performance and the information sought, search tactics used, and the assessment of the relevance and utility of the information found. The corroborated hypotheses extend and specify ideas in Kuhlthau's model in the domain of IR. A theory of task-based information searching based on the empirical findings of the study is presented.  相似文献   

18.
Background: Measures of the effectiveness of databases have traditionally focused on recall, precision, with some debate on how relevance can be assessed, and by whom. New measures of database performance are required when users are familiar with search engines, and expect full text availability. Objectives: This research ascertained which of four bibliographic databases (bni, cinahl, medline and embase ) could be considered most useful to nursing and midwifery students searching for information for an undergraduate dissertation. Methods: Searches on title were performed for dissertation topics supplied by nursing students (n = 9), who made the relevance judgements. Measures of recall and precision were combined with additional factors to provide measures of effectiveness, while efficiency combined measures of novelty and originality and accessibility combined measures for availability and retrievability, based on obtainability. Results: There were significant differences among the databases in precision, originality and availability, but other differences were not significant (Friedman test). Odds ratio tests indicated that bni , followed by cinahl were the most effective, cinahl the most efficient, and bni the most accessible. Conclusions: The methodology could help library services in purchase decisions as the measure for accessibility, and odds ratio testing helped to differentiate database performance.  相似文献   

19.
OBJECTIVES: Our objectives were to identify literature on: (i) theory, evidence and gaps in knowledge relating to the help-seeking behaviour of people with learning disabilities and their carers; (ii) barriers experienced by people with learning disabilities in securing access to the full range of health services; (iii) interventions which improve access to health services by people with learning disabilities. DATA SOURCES: twenty-eight bibliographic databases, research registers, organizational websites or library catalogues; reference lists from identified studies; contact with experts; current awareness and contents alerting services in the area of learning disabilities. REVIEW METHODS: Inclusion criteria were English language literature from 1980 onwards, relating to people with learning disabilities of any age and all study designs. The main criteria for assessment was relevance to the Guilliford et al. model of access to health care (Gulliford et al. Access to health care. Report of a Scoping Exercise for the National Co-ordinating Centre for NHS Service Delivery and Organisation R & D (NCCSDO). London: NCCSDO, 2001), which was modified to the special needs of people with learning disabilities. Inclusion criteria focused on relevance to the model with initial criteria revised in light of literature identified and comments from a consultation exercise with people with learning disabilities, family and paid carers and experts in the field. Data abstraction was completed independently and selected studies were evaluated for scientific rigour and the results synthesized. RESULTS: In total, 2221 items were identified as potentially relevant and 82 studies fully evaluated. CONCLUSIONS: The process of identifying relevant literature was characterized by a process of clarifying the concept under investigation and sensitive search techniques which led to an initial over-identification of non-relevant records from database searches. Thesaurus terms were of limited value, forcing a reliance on using free-text terms and alternative methods of identifying literature to supplement and improve the recall of the database searches. A key enabler in identifying relevant literature was the depth and breadth of knowledge built up by the reviewers whilst engaged in this process.  相似文献   

20.
A better understanding of users' search interactions in library search systems is key to improving the result ranking. By focusing on known-item searches (searches for an item already known) and search tactics, vast improvement can be made. To better understand user behaviour, we conducted four transaction-log studies, comprising more than 4.2 million search sessions from two German library search systems. Results show that most sessions are rather short; users tend to issue short queries and usually do not go beyond the first search engine result page (SERP). The most frequently used search tactic was the extension of a query (‘Exhaust’). Looking at the known-item searches, it becomes clear that this query type is of great importance. Between 38%–57% of all queries are known-item queries. Titles or title parts were the most frequent elements of these queries, either alone or in combination with the author's name. Unsuccessful known-item searches were often caused by items not available in the system. Results can be applied by libraries and library system vendors to improve their systems, as well as when designing new systems. Future research, in addition to log data, should also include background information on the usage, for example, through user surveys.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号