首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Searchers seldom make use of the advanced searching features that could improve the quality of the search process because they do not know these features exist, do not understand how to use them, or do not believe they are effective or efficient. Information retrieval systems offering automated assistance could greatly improve search effectiveness by suggesting or implementing assistance automatically. A critical issue in designing such systems is determining when the system should intervene in the search process. In this paper, we report the results of an empirical study analyzing when during the search process users seek automated searching assistance from the system and when they implement the assistance. We designed a fully functional, automated assistance application and conducted a study with 30 subjects interacting with the system. The study used a 2G TREC document collection and TREC topics. Approximately 50% of the subjects sought assistance, and over 80% of those implemented that assistance. Results from the evaluation indicate that users are willing to accept automated assistance during the search process, especially after viewing results and locating relevant documents. We discuss implications for interactive information retrieval system design and directions for future research.  相似文献   

4.
Structured document retrieval makes use of document components as the basis of the retrieval process, rather than complete documents. The inherent relationships between these components make it vital to support users’ natural browsing behaviour in order to offer effective and efficient access to structured documents. This paper examines the concept of best entry points, which are document components from which the user can browse to obtain optimal access to relevant document components. It investigates at the types of best entry points in structured document retrieval, and their usage and effectiveness in real information search tasks.  相似文献   

5.
Due to their ready availability, database management systems are being applied to bibliographic databases with increasing frequency. This is being done in spite of the fact that although DBMS query languages tend to be very powerful, they are far too complex for the casual user. It is proposed that PSI, an existing virtual-system intermediary for document retrieval systems, be extended to include access to DBMS containing bibliographic data in order to circumvent the complexity problem or the casual user. PSI currently provides a common command language for access to multiple document retrieval systems. It is shown that PSI could be extended to provide this same command language to access DBMS, whether the DBMS are relational or network.  相似文献   

6.
We demonstrate effective new methods of document ranking based on lexical cohesive relationships between query terms. The proposed methods rely solely on the lexical relationships between original query terms, and do not involve query expansion or relevance feedback. Two types of lexical cohesive relationship information between query terms are used in document ranking: short-distance collocation relationship between query terms, and long-distance relationship, determined by the collocation of query terms with other words. The methods are evaluated on TREC corpora, and show improvements over baseline systems.  相似文献   

7.
In information retrieval, cluster-based retrieval is a well-known attempt in resolving the problem of term mismatch. Clustering requires similarity information between the documents, which is difficult to calculate at a feasible time. The adaptive document clustering scheme has been investigated by researchers to resolve this problem. However, its theoretical viewpoint has not been fully discovered. In this regard, we provide a conceptual viewpoint of the adaptive document clustering based on query-based similarities, by regarding the user’s query as a concept. As a result, adaptive document clustering scheme can be viewed as an approximation of this similarity. Based on this idea, we derive three new query-based similarity measures in language modeling framework, and evaluate them in the context of cluster-based retrieval, comparing with K-means clustering and full document expansion. Evaluation result shows that retrievals based on query-based similarities significantly improve the baseline, while being comparable to other methods. This implies that the newly developed query-based similarities become feasible criterions for adaptive document clustering.  相似文献   

8.
The retrieval effectiveness of the underlying document search component of an expert search engine can have an important impact on the effectiveness of the generated expert search results. In this large-scale study, we perform novel experiments in the context of the document search and expert search tasks of the TREC Enterprise track, to measure the influence that the performance of the document ranking has on the ranking of candidate experts. In particular, our experiments show that while the expert search system performance is related to the relevance of the retrieved documents, surprisingly, it is not always the case that increasing document search effectiveness causes an increase in expert search performance. Moreover, we simulate document rankings designed with expert search performance in mind and, through a failure analysis, show why even a perfect document ranking may not result in a perfect ranking of candidate experts.  相似文献   

9.
The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is effective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations.Part 1 covers the foundations and the model development for document collection and relevance data, along with the test apparatus. Part 2 covers the further development and elaboration of the model, with extensive testing, and briefly considers other environment conditions and tasks, model training, concluding with comparisons with other approaches and an overall assessment.Data and results tables for both parts are given in Part 1. Key results are summarised in Part 2.  相似文献   

10.
Documents circulating in paper form are increasingly being substituted by its electronic equivalent in the modern office today so that any stored document can be retrieved whenever needed later on. The office worker is already burdened with information overload, so effective and efficient retrieval facilities become an important factor affecting worker productivity. This paper first reviews the features of current document management systems with varying facilities to manage, store and retrieve either reference to documents or whole documents. Information retrieval databases, groupware products and workflow management systems are presented as developments to handle different needs, together with the underlying concepts of knowledge management. The two problems of worker finiteness and worker ignorance remain outstanding, as they are only partially addressed by the above-mentioned systems. The solution lies in a shift away from pull technology where the user has to actively initiate the request for information towards push technology, where available information is automatically delivered without user intervention. Intelligent information retrieval agents are presented as a solution together with a marketing scenario of how they can be introduced.  相似文献   

11.
This paper discusses research into chemical information and document retrieval systems at the Department of Information Studies, University of Sheffield. The research includes the use of cluster analysis methods for document retrieval and drug design, the representation and searching of files of generic chemical structures, substructure searching and maximal common substructure identification in files of three-dimensional chemical structures, and the use of novel parallel computer hardware in all of these application areas.  相似文献   

12.
We are interested in how ideas from document clustering can be used to improve the retrieval accuracy of ranked lists in interactive systems. In particular, we are interested in ways to evaluate the effectiveness of such systems to decide how they might best be constructed. In this study, we construct and evaluate systems that present the user with ranked lists and a visualization of inter-document similarities. We first carry out a user study to evaluate the clustering/ranked list combination on instance-oriented retrieval, the task of the TREC-6 Interactive Track. We find that although users generally prefer the combination, they are not able to use it to improve effectiveness. In the second half of this study, we develop and evaluate an approach that more directly combines the ranked list with information from inter-document similarities. Using the TREC collections and relevance judgments, we show that it is possible to realize substantial improvements in effectiveness by doing so, and that although users can use the combined information effectively, the system can provide hints that substantially improve on the user's solo effort. The resulting approach shares much in common with an interactive application of incremental relevance feedback. Throughout this study, we illustrate our work using two prototype systems constructed for these evaluations. The first, AspInQuery, is a classic information retrieval system augmented with a specialized tool for recording information about instances of relevance. The other system, Lighthouse, is a Web-based application that combines a ranked list with a portrayal of inter-document similarity. Lighthouse can work with collections such as TREC, as well as the results of Web search engines.  相似文献   

13.
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system—the Query, Cluster, Summarize (QCS) system—which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic.We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines.Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence “trimming” and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format.Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.  相似文献   

14.
Search patterns of documents and information requests are their better or worse representatives only, so it is important to carry on examinations on possibilities of designing self-learning information retrieval systems. Another important question is to elaborate such an organization of document search pattern set as to obtain an acceptable response time of the information system to a given information request.A self-learning process of the proposed information system consists in the determination—on a set of document and information request search patterns—of the similarity relation according to L. A. Zadeh.The organization of a set of document search patterns proposed in the paper ensures the limitation of document search pattern set searching process—when retrieving a response to a given information request—to one (or several) subset from previously determined subsets. This makes the information system response time acceptable. The proposed information retrieval strategy is discussed in terms of fuzzy sets.  相似文献   

15.
Existing ranking schemes assume all term occurrences in a given document are of equal influence. Intuitively, terms occurring in some places should have a greater influence than those elsewhere. An occurrence in an abstract may be more important than an occurrence in the body text. Although this observation is not new, there remains the issue of finding good weights for each structure.Vector space, probability, and Okapi BM25 ranking are extended to include structure weighting. Weights are then selected for the TREC WSJ collection using a genetic algorithm. The learned weights are then tested on an evaluation set of queries. Structure weighted vector space inner product and structure weighted probabilistic retrieval show an about 5% improvement in mean average precision over their unstructured counterparts. Structure weighted BM25 shows nearly no improvement. Analysis suggests BM25 cannot be improved using structure weighting.  相似文献   

16.
Previous studies have repeatedly demonstrated that the relevance of a citing document is related to the number of times with which the source document is cited. Despite the ease with which electronic documents would permit the incorporation of this information into citation-based document search and retrieval systems, the possibilities of repeated citations remain untapped. Part of this under-utilization may be due to the fact that very little is known regarding the pattern of repeated citations in scholarly literature or how this pattern may vary as a function of journal, academic discipline or self-citation. The current research addresses these unanswered questions in order to facilitate the future incorporation of repeated citation information into document search and retrieval systems. Using data mining of electronic texts, the citation characteristics of nine different journals, covering the three different academic fields (economics, computing, and medicine & biology), were characterized. It was found that the frequency (f) with which a reference is cited N or more times within a document is consistent across the sampled journals and academic fields. Self-citation causes an increase in frequency, and this effect becomes more pronounced for large N. The objectivity, automatability, and insensitivity of repeated citations to journal and discipline, present powerful opportunities for improving citation-based document search.  相似文献   

17.
This paper concerns the provision of a computerized intermediary system to facilitate online document retrieval from large-scale data bases directly by users of the retrieved information. The system does not require the user to be knowledgeable or undergo any training in the use of the underlying retrieval system. The scope for a novel intermediary system relating to recent developments in expert systems has been identified and a system entitled CANSEARCH designed to enable doctors to specify queries to retrieve cancer-therapy-related documents stored in the MEDLINE data base. The design of the intermediary system uses the principle of search space abstraction, employing menu selection from a touch terminal and encapsulating the necessary intermediary expertise using rule-based techniques programmed in PROLOG. CANSEARCH performed well enough to justify the approach taken, suggesting that further development of CANSEARCH and of intermediary systems for document retrieval in other subject areas should be undertaken.  相似文献   

18.
Structured document retrieval makes use of document components as the basis of the retrieval process, rather than complete documents. The inherent relationships between these components make it vital to support users’ natural browsing behaviour in order to offer effective and efficient access to structured documents. This paper examines the concept of best entry points, which are document components from which the user can browse to obtain optimal access to relevant document components. In particular this paper investigates the basic characteristics of best entry points.  相似文献   

19.
刘秀娟 《现代情报》2010,30(7):138-139,142
现代信息技术正对传统的文献检索课程目标、教学内容、教学方式和评价产生深刻的变革和影响。计算机辅助教学已经不能完全覆盖信息技术对信息素养教育所产生的影响。信息技术与课程整合正开辟了一个崭新的研究领域和实践空间。因此,本文探讨了信息技术与文献检索课整合的含义、层次和整合点,旨在在新技术条件下从文检课的教与学方式、教学结构方面探索教学改革的新思路。  相似文献   

20.
Mining linkage information from the citation graph has been shown to be effective in identifying important literatures. However, the question of how to utilize linkage information from the citation graph to facilitate literature retrieval still remains largely unanswered. In this paper, given the context of biomedical literature retrieval, we first conduct a case study in order to find out whether applying PageRank and HITS algorithms directly to the citation graph is the best way of utilizing citation linkage information for improving biomedical literature retrieval. Second, we propose a probabilistic combination framework for integrating citation information into the content-based information retrieval weighting model. Based on the observations of the case study, we present two strategies for modeling the linkage information contained in the citation graph. The proposed framework provides a theoretical support for the combination of content and linkage information. Under this framework, exhaustive parameter tuning can be avoided. Extensive experiments on three TREC Genomics collections demonstrate the advantages and effectiveness of our proposed methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号