首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Search patterns of documents and information requests are their better or worse representatives only, so it is important to carry on examinations on possibilities of designing self-learning information retrieval systems. Another important question is to elaborate such an organization of document search pattern set as to obtain an acceptable response time of the information system to a given information request.A self-learning process of the proposed information system consists in the determination—on a set of document and information request search patterns—of the similarity relation according to L. A. Zadeh.The organization of a set of document search patterns proposed in the paper ensures the limitation of document search pattern set searching process—when retrieving a response to a given information request—to one (or several) subset from previously determined subsets. This makes the information system response time acceptable. The proposed information retrieval strategy is discussed in terms of fuzzy sets.  相似文献   

3.
4.
The object of this paper is to present a new kind of approach to the problem of information system effectiveness evaluation as based on the theory of fuzzy sets. On the basis of this theory, the concepts of relevance and pertinence, which are the basic concepts used in determining the indices of information system effectiveness evaluation, have been defined. Assuming that in evaluating the effectiveness of information systems, one should consider separately the problem of quality evaluation of the transformation of the contents of documents and information requests into their search patterns and the problem of quality evaluation of the process of profile control of a document set of the information system, definitions have been given of parameters of quality evaluation of the transformation of the contents of documents and information requests into their search patterns with regard to a given information request as well as of parameters of quality evaluation of the process with regard to the whole set of information requests under examination. Besides, parameters of quality evaluation of the process of profile control of a document set of the information system have been defined. The parameters of effectiveness evaluation of information systems put forward in this paper take account of the fact that both evaluation of the relevance and evaluation of the pertinence of documents are of a continuous character.  相似文献   

5.
Direct end-user data entry and retrieval is a major factor in achieving an economical information retrieval system. To be effective, such a system would have to provide a thesaurus structure which leads novice end-users to browse subject areas before retrieval and yet provides control and coverage of terms in a domain. A faceted hierarchical thesaurus organization has been designed to accomplish this goal.  相似文献   

6.
Vocabulary mining in information retrieval refers to the utilization of the domain vocabulary towards improving the user’s query. Most often queries posed to information retrieval systems are not optimal for retrieval purposes. Vocabulary mining allows one to generalize, specialize or perform other kinds of vocabulary-based transformations on the query in order to improve retrieval performance. This paper investigates a new framework for vocabulary mining that derives from the combination of rough sets and fuzzy sets. The framework allows one to use rough set-based approximations even when the documents and queries are described using weighted, i.e., fuzzy representations. The paper also explores the application of generalized rough sets and the variable precision models. The problem of coordination between multiple vocabulary views is also examined. Finally, a preliminary analysis of issues that arise when applying the proposed vocabulary mining framework to the Unified Medical Language System (a state-of-the-art vocabulary system) is presented. The proposed framework supports the systematic study and application of different vocabulary views in information retrieval.  相似文献   

7.
Information-systems are classified into two types, termed “Evidence-of Existence” and “Presentation” of information. The objective of the evidence-type system lies in the domain of documentation and retrieval of information. The structure of this system-type is developed, with application of cybernetic concepts, as an isomorphic model in analogy to the system-structure of communication technology. The latter postulates three criteria of structuring: (1) Source-Channel-Sink, with input-output characteristics, (2) Filter-type communication-channel, (3) Reversable code. These criteria are applied to the structuring of information-systems of the evidence-of-existence type. For the purpose of two-way communication the information-systems have to be represented by closed-loop models. The selective-retrieval requirements necessitate the system-channel to be a filter of information. These information-filters are implemented by keyword-phrases, being identical with the codewords. They yield a uniquely decodable code which is totally reversible to adequately serve both the documentation and the retrieval of documents. It is proven that hierarchic information-systems, applying categorization or subject-heading objects of information, do not meet the mandatory code-requirements. The inherent coding-deficiencies of hierarchic systems generate intolerable retrieval ambiguities. The same critique applies to the thesaurus concept. The development of a novel species of thesaurus is suggested, realizing a kind of Linnéan encyclopedia of general human knowledge, presenting all relevant interrelations of objects of knowledge. Such thesaurus would provide the much needed support for formulating efficient search queries. Other relevant features of communication technology, like the information-potential, should be isomorphically transformed into information-system models.  相似文献   

8.
9.
10.
叙词在网络环境中的应用   总被引:1,自引:1,他引:1  
戴剑波 《情报科学》2004,22(4):502-505
本文叙述了叙词在网络环境下的三种应用模式,在一些专业性的网站以及网关检索系统中用叙词直接标引和检索是非常的普遍;叙词由于其概念定义明确,有很好的词问关系的显示,叙词能在基于关键词检索的搜索引擎中实现检索式的扩展的功能;不同部门对所拥有的资料和图书馆等信息源一般所采用的不同的叙词表或采用分类法,在网络环境下,通过一种主题的途径来检索这些信息是信息情报界研究的一个热点,叙词在这方面有着重要的作用。  相似文献   

11.
Following a general discussion on the philosophy and design of information systems, with particular attention to the definition, needs and psychology of the ultimate user of systems providing on-line access to biomedical information, the role of the documentalist, the differences between document retrieval and true information retrieval and the operational characteristics of on-line systems which affect their cost and hence their design and acceptability, the authors make some tentative predictions as to the future demand for such information retrieval services and their probable organizational form. A brief report is then presented on the principal findings and conclusions of a user's study of the Excerpta Medica system, the key features and history of which are briefly described. Based on the conclusions of this study, particularly as regards the complexity of the average search question, the role of the search formulators in determining the results of computer searching, the importance of secondary concepts for retrieval and the optimal level of specificity of a computer thesaurus, some of the changes in the Excerpta Medica system which are in the planning stage and will be incorporated into the system's Mark II version are outlined, as are the principal features of the two systems currently offering on-line access to the Excerpta Medica database in Western Germany and the U.S.A. Finally, attention is given to the planned partial hierarchic structuring of the Excerpta Medica thesaurus (Malimet), a project which is to be based largely on frequency counts of the existing database and the elimination of over-specific terms by posting under broader concepts. The results of some of the initial steps in this direction (i.e. frequency counts of portions of the database and the structuring of some of the terms used in the cancer field) are presented by way of illustration.  相似文献   

12.
Modern information retrieval systems are designed to supply relevant information in response to requests received from the user population. In most retrieval environments the search requests consist of keywords, or index terms, interrelated by appropriate Boolean operators. Since it is difficult for untrained users to generate effective Boolean search requests, trained search intermediaries are normally used to translate original statements of user need into useful Boolean search formulations. Methods are introduced in this study which reduce the role of the search intermediaries by making it possible to generate Boolean search formulations completely automatically from natural language statements provided by the system patrons. Frequency considerations are used automatically to generate appropriate term combinations as well as Boolean connectives relating the terms. Methods are covered to produce automatic query formulations both in a standard Boolean logic system, as well as in an extended Boolean system in which the strict interpretation of the connectives is relaxed. Experimental results are supplied to evaluate the effectiveness of the automatic query formulation process, and methods are described for applying the automatic query formulation process in practice.  相似文献   

13.
14.
15.
Term classifications and thesauri can be used for many purposes in automatic information retrieval. Normally a thesaurus is generated manually by subject experts: alternatively, the associations between the terms can be obtained automatically by using the occurrence characteristics of the terms across the documents of a collection. A third possibility consists in taking into account user relevance assessments of certain documents with respect to certain queries in order to build term classes designed to retrieve the relevant documents and simultaneously to reject the nonrelevant documents. This last strategy, known as pseudoclassification, produces a user-dependent term classification.A number of pseudoclassification studies are summarized in the present report, and conclusions are reached concerning the effectiveness and feasibility of constructing term classifications based on human relevance assessments.  相似文献   

16.
Decisions in thesaurus construction and use   总被引:1,自引:0,他引:1  
A thesaurus and an ontology provide a set of structured terms, phrases, and metadata, often in a hierarchical arrangement, that may be used to index, search, and mine documents. We describe the decisions that should be made when including a term, deciding whether a term should be subdivided into its subclasses, or determining which of more than one set of possible subclasses should be used. Based on retrospective measurements or estimates of future performance when using thesaurus terms in document ordering, decisions are made so as to maximize performance. These decisions may be used in the automatic construction of a thesaurus. The evaluation of an existing thesaurus is described, consistent with the decision criteria developed here. These kinds of user-focused decision-theoretic techniques may be applied to other hierarchical applications, such as faceted classification systems used in information architecture or the use of hierarchical terms in “breadcrumb navigation”.  相似文献   

17.
This paper reviews some aspects of the relationship between the large and growing fields of machine learning (ML) and information retrieval (IR). Learning programs are described along several dimensions. One dimension refers to the degree of dependence of an ML + IR program on users, thesauri, or documents. This paper emphasizes the role of the thesaurus in ML + IR work. ML + IR programs are also classified in a dimension that extends from knowledge-sparse learning at one end to knowledge-rich learning at the other. Knowledge-sparse learning depends largely on user yes-no feedback or on word frequencies across documents to guide adjustments in the IR system. Knowledge-rich learning depends on more complex sources of feedback, such as the structure within a document or thesaurus, to direct changes in the knowledge bases on which an intelligent IR system depends. New advances in computer hardware make the knowledge-sparse learning programs that depend on word occurrences in documents more practical. Advances in artificial intelligence bode well for knowledge-rich learning.  相似文献   

18.
The success of information retrieval depends on the ability to measure the effective relationship between a query and its response. If both are posed in natural language, one might expect that understanding the meaning of that language could not be avoided. The aim of this research is to demonstrate that it is perhaps unnecessary to be able to determine the meaning in the absolute sense; it may be sufficient to measure how far there is a conformity in meaning, and then only in the context of the set of documents in which the answer to a query is sought. Handling a particular language using a computer is made possible through replacing certain texts by special sets. A given text has a ‘syntactic trace’, the set of all the overlapping trigrams forming part of the text. When determining the effective relationship between a query and its answer, not only do their syntactic traces play a role, but so do the traces of all other documents in the set. This is known as the ‘information trace method’.  相似文献   

19.
The Internet, together with the large amount of textual information available in document archives, has increased the relevance of information retrieval related tools. In this work we present an extension of the Gambal system for clustering and visualization of documents based on fuzzy clustering techniques. The tool allows to structure the set of documents in a hierarchical way (using a fuzzy hierarchical structure) and represent this structure in a graphical interface (a 3D sphere) over which the user can navigate.Gambal allows the analysis of the documents and the computation of their similarity not only on the basis of the syntactic similarity between words but also based on a dictionary (Wordnet 1.7) and latent semantics analysis.  相似文献   

20.
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system—the Query, Cluster, Summarize (QCS) system—which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic.We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines.Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence “trimming” and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format.Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号