首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
A general method is presented to construct ordered similarity measures (OS-measures), i.e., similarity measures for ordered sets of documents (as, e.g., being the result of an IR-process), based on classical, well-known similarity measures for ordinary sets (measures such as Jaccard, Dice, Cosine or overlap measures). To this extent, we first present a review of these measures and their relationships.The method given here to construct OS-measures extends the one given by Michel in a previous paper so that it becomes applicable on any pair of ordered sets. Concrete expressions of this method, applied to the classical similarity measures, are given.Some of these measures are then tested in the IR-system Profil-Doc. The engine SPIRIT© extracts ranked document sets in three different contexts, each for 550 requests. The practical usability of the OS-measures is then discussed based on these experiments.  相似文献   

The present-day guidlines for thesaurus design recommend the two different strategies—the committee and empirical approaches—for identifying candidate terms. An argument is made that the basis for the recommendation is the assumption that the knowledge based on the consensus of experts of a field is different from the knowledge expressed in the literature of that field. An experiment was conducted to test the validity of this assumption. The finding that the two strategies failed to generate the two significantly different lists of terms challenges the validity of the assumption and raises several important questions to the theorists who write the guidelines for thesaurus design and to those who must put the guidelines into practice for design of a thesaurus.  相似文献   

In this article we discuss the need for distributed information retrieval Systems. A number of possible configurations are presented. A general approach to the design of such systems is discussed. A prototype implementation is described together with the experiences gained from this implementation.  相似文献   

Traditional measures of retrieval effectiveness, of which the recall ratio is an outstanding example, are strongly influenced by the relevance properties of unexamined documents—documents with which the system user has no direct contact. Such an influence is awkward to explain in traditional terms, but is readily justified within the broader framework of a utility-theoretic approach. The utility-theoretic analysis shows that unexamined documents can be important in theory, but usually are not when it is the statistics of large samples that are of interest. It is concluded that the traditional concern with the relevance or nonrelevance of unexamined documents is misplaced, and that traditional measures of effectiveness should be replaced by estimates of the direct utility of the examined documents.  相似文献   

Due to their ready availability, database management systems are being applied to bibliographic databases with increasing frequency. This is being done in spite of the fact that although DBMS query languages tend to be very powerful, they are far too complex for the casual user. It is proposed that PSI, an existing virtual-system intermediary for document retrieval systems, be extended to include access to DBMS containing bibliographic data in order to circumvent the complexity problem or the casual user. PSI currently provides a common command language for access to multiple document retrieval systems. It is shown that PSI could be extended to provide this same command language to access DBMS, whether the DBMS are relational or network.  相似文献   

In this paper, we present a new method to improve the performance of query processing in a spatial database. The previous approach can process the retrieval of spatial objects by topological relations using R-tree structures based on minimum bounding rectangles. In our approach, we add internal rectangle to the leaf nodes of the R-tree as additional information for helping objects retrieval. As a result, the number of false hits can be reduced and part of the true hits can be identified at the early stage of searching. The experiments demonstrated that the performance of database systems can be improved because both the number of objects accessed and the number of objects requiring detailed inspection are much less than those in the previous approach.  相似文献   

Traditional measures of retrieval effectiveness, of which the recall ratio is an outstanding example, are strongly influenced by the relevance properties of unexamined documents—documents with which the system user has no direct contact. Such an influence is awkward to explain in traditional terms, but is readily justified within the broader framework of a utility-theoretic approach. The utility-theoretic analysis shows that unexamined documents can be important in theory, but usually are not when it is the statistics of large samples that are of interest. It is concluded that the traditional concern with the relevance or nonrelevance of unexamined documents is misplaced, and that traditional measures of effectiveness should be replaced by estimates of the direct utility of the examined documents.  相似文献   

Evaluation research on information retrieval (IR) systems has thus far been narrowly focused and disjointed. This research attempts to narrow the gap by providing a comprehensive and integrated multiple criteria decision-theoretic approach for the evaluation of IR systems. The approach, which is based on the Analytic Hierarchy Process (AHP), is illustrated in the context of a domain-specific IR system. The novelty of this approach lies in the focus on the user aspect and the application of decision-making theories in the IR field.  相似文献   

The fundamental idea of the work reported here is to extract index phrases from texts with the help of a single word concept dictionary and a thesaurus containing relations among concepts. The work is based on the fact, that, within every phrase, the single words the phrase is composed of are related in a certain well denned manner, the type of relations holding between concepts depending only on the concepts themselves. Therefore relations can be stored in a semantic network. The algorithm described extracts single word concepts from texts and combines them to phrases using the semantic relations between these concepts, which are stored in the network. The results obtained show that phrase extraction from texts by this semantic method is possible and offers many advantages over other (purely syntactic or statistic) methods concerning preciseness and completeness of the meaning representation of the text. But the results show, too, that some syntactic and morphologic “filtering” should be included for effectivity reasons.  相似文献   

A model of a user's scan of the output of an information storage and retrieval system in response to a query is presented. Rules for determining the user's optimal stopping point are discussed and compared. A dynamic model for determining the proper stopping point using decision theory under risk with changing utilities is used as the basis for a Bayesian model of user scanning behavior. An algorithm to implement the Bayesian model is introduced and examples of the model are given. The implications for retrieval systems design and evaluation are discussed.  相似文献   

【目的】 统计并分析Ei Compendex数据库收录中国科技期刊的情况,为中文科技期刊争取EI收录提供数据支撑。【方法】 统计了EI历年来收录中国科技期刊的数量,并以2016年收录数据为例,分析了EI新增收录和终止收录的中文科技期刊的特点。【结果】 EI优先收录出版语言为英文的期刊,近年来对中文科技期刊的收录数量逐年紧缩;EI仅收录在学科领域内期刊综合评价指标排名稳定且进入前3%的期刊,排名前5%以后的期刊很难被EI收录,除非所刊载文章为EI感兴趣的新兴学科领域;期刊被其他重要数据库收录的情况也是EI重点考虑的因素之一。【结论】 EI近年来收紧了对科技期刊的收录,中文期刊争取进入EI收录的难度将不断加大,办出期刊学科特色、提高核心竞争力才是期刊立足之根本。  相似文献   

我国SCI期刊的计量分析与发展建议   总被引:1,自引:1,他引:0  
【目的】评价我国的SCI期刊发展状况,在此基础上指出我国科技期刊影响力提升的路径。【方法】 以2012年JCR收录且有指标值的127种我国科技期刊为数据源,从影响因子、影响因子排序、总被引频次、总被引频次排序等四个方面进行计量分析。【结果】 通过指标比较,发现相对于我国的SCI发文量,我国的SCI期刊量无论从数量还是质量方面都有待提高。【结论】我国科技期刊可从提高认识、加强监管、完善审稿、扩大开放、提高期刊学术水平、打造中国顶尖学术论文展示平台等几个方面提升影响力。  相似文献   

Measuring effectiveness of information retrieval (IR) systems is essential for research and development and for monitoring search quality in dynamic environments. In this study, we employ new methods for automatic ranking of retrieval systems. In these methods, we merge the retrieval results of multiple systems using various data fusion algorithms, use the top-ranked documents in the merged result as the “(pseudo) relevant documents,” and employ these documents to evaluate and rank the systems. Experiments using Text REtrieval Conference (TREC) data provide statistically significant strong correlations with human-based assessments of the same systems. We hypothesize that the selection of systems that would return documents different from the majority could eliminate the ordinary systems from data fusion and provide better discrimination among the documents and systems. This could improve the effectiveness of automatic ranking. Based on this intuition, we introduce a new method for the selection of systems to be used for data fusion. For this purpose, we use the bias concept that measures the deviation of a system from the norm or majority and employ the systems with higher bias in the data fusion process. This approach provides even higher correlations with the human-based results. We demonstrate that our approach outperforms the previously proposed automatic ranking methods.  相似文献   

This paper describes a formal standardized procedure for the decision-making process for the purchase or rejection of an information storage and retrieval system. The interaction of both the purchaser of the system and its potential users with the various models of the system (such as cost-time-volume models and performance evaluation models) ensures that the purchase decision for a given system is affected by all possible constraints and is universally acceptable. If either purchaser or users find a particular system unacceptable, the procedure either rejects it or institutes modifications (within given constraints) until a generally acceptable system is determined, if one exists.  相似文献   

在Web信息检索中,为了明确用户的查询需求,很多搜索引擎和全文数据库提供了相关词提示功能。本文简要介绍了Web信息检索中相关词提示的获取技术,并对相关词提示效果进行实际调查分析。从关键词库中随机抽取若干关键词,在选定的搜索引擎和全文数据库上进行信息检索,获取抽样关键词的相关提示词。通过关键词检索、人工打分和数据统计,进行查询扩展分析、查询式专指度分析和查准率分析,给出相关词提示在改善检索效果和用户满意度方面的综合评价。  相似文献   

All library school students should be provided with an opportunity to obtain hands-on experience in using such bibliographic retrieval systems as ORBIT, DIALOG, OCLC, etc. Yet, such training is both costly and time consumming. Two key issues that must be resolved in order to make the training more efficient and more effective are: (1) the integration of training in the curriculum as a module in cataloging and reference courses or as a separate course; and (2) the method of training which may include the use of videotapes, demonstrations, training manuals, etc. The teaching program at UCLA provides for discussion and demonstration of on-line retrieval techniques in the basic courses and advanced search training in a separate course using a specially prepared training manual.  相似文献   

Think tanks have been proved helpful for decision-making in various communities. However, collecting information manually for think tank construction implies too much time and labor cost as well as inevitable subjectivity. A probable solution is to retrieve webpages of renowned experts and institutes similar to a given example, denoted as query by webpage (QBW). Considering users’ searching behaviors, a novel QBW model based on webpages’ visual and textual features is proposed. Specifically, a visual feature extraction module based on pre-trained neural networks and a heuristic pooling scheme is proposed, which bridges the gap that existing extractors fail to extract snapshots’ high-level features and are sensitive to the noise effect brought by images. Moreover, a textual feature extraction module is proposed to represent textual content in both term and topic grains, while most existing extractors merely focus on the term grain. In addition, a series of similarity metrics are proposed, including a textual similarity metric based on feature bootstrapping to improve model’s robustness and an adaptive weighting scheme to balance the effect of different types of features. The proposed QBW model is evaluated on expert and institute introduction retrieval tasks in academic and medical scenarios, in which the average value of MAP has been improved by 10% compared to existing baselines. Practically, useful insights can be derived from this study for various applications involved with webpage retrieval besides think tank construction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号