首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A comparative evaluation has been carried out on the Philips “DIRECT” and the British “INSPEC” retrieval system. DIRECT is based on automatic indexing whereas INSPEC uses manual subject indexing.Two queries were submitted to both systems, using the same data base. The results are expressed in terms of recall and precision. Both recall and precision of INSPEC were found to be higher than those of DIRECT by 20%. It is concluded that this is mainly a result of the query formulation. The effectiveness obtained with automatic indexing of documents is equivalent to that of the manual procedure.  相似文献   

2.
3.
4.
5.
A variety of abstract automatic indexing models have been developed in recent times in an effort to produce indexing methods that are both effective and usable in practice. Among these are the term discrimination model and the term precision system. These two indexing systems are briefly described and experimental evidence is cited showing that a combination of both theories produces better retrieval performance than either one alone. Appropriate conclusions are reached concerning viable automatic indexing procedures usable in practice.  相似文献   

6.
The fundamental idea of the work reported here is to extract index phrases from texts with the help of a single word concept dictionary and a thesaurus containing relations among concepts. The work is based on the fact, that, within every phrase, the single words the phrase is composed of are related in a certain well denned manner, the type of relations holding between concepts depending only on the concepts themselves. Therefore relations can be stored in a semantic network. The algorithm described extracts single word concepts from texts and combines them to phrases using the semantic relations between these concepts, which are stored in the network. The results obtained show that phrase extraction from texts by this semantic method is possible and offers many advantages over other (purely syntactic or statistic) methods concerning preciseness and completeness of the meaning representation of the text. But the results show, too, that some syntactic and morphologic “filtering” should be included for effectivity reasons.  相似文献   

7.
The profusion of online resources calls for tools and methods to help Internet users find precisely what they are looking for. Quality controlled gateway CISMeF provides such services for health resources. However, the human cost of maintaining and updating the catalogue are increasingly high. This paper presents the automatic indexing system currently developed in the CISMeF team to be used as such for preliminary indexing, or after human reviewing for the final indexing. The system architecture, using the INTEX platform for MeSH term extraction is detailed. The results of a first evaluation tend to indicate that the automatic indexing strategy is relevant, as it achieves a precision comparable to that of other existing operational systems. Moreover, the system presented in this paper retrieves keyword/qualifier pairs as opposed to single terms, therefore providing a significantly more precise indexing. Further development and tests will be carried out in order to improve the coverage of the dictionaries, and validate the efficiency of the system in the indexers’ everyday work.  相似文献   

8.
The paper describes a technique developed as automatic support to subject heading indexing at BIOSIS. The technique is based on the use of a formalized language for semantic representation of biological texts and subject headings—the language of Concept Primitives. The structure of the language is discussed as well as the structure of the Semantic Vocabulary, in which natural language words from biological texts are described by Concept Primitives. The Semantic Vocabulary is being constructed. Approximately 8,000 entries corresponding to high frequency significant words have been compiled, comprising at least three-quarters of the final number. Results of experiments checking the approach are given, and journal/subject heading and author/subject heading correlation data are analyzed to be used as a supporting technique.  相似文献   

9.
LIPHIS (Linked Phrase Indexing System) is a system of computer-assisted permuted subject indexing designed, like its precursor NEPHIS, to be economical and to be as easy as possible for the indexer, for the programmer, and for the user of the index. Unlike NEPHIS, LIPHIS is designed to handle more complex networks of concept relations, and so produce better indexing of highly detailed subjects.  相似文献   

10.
The Defense Documentation Center (DDC), a field activity of the Defense Supply Agency, implemented an automated indexing procedure in October 1973. This Machine-Aided Indexing (MAI) System [1] had been under development since 1969. The following is a report of several comparisons designed to measure the retrieval effectiveness of MAI and manual indexing procedures under normal operational conditions.Several definitions are required in order to clarify the MAI process as it pertains to these investigations. The MAI routines scan unedited text in the form of titles and abstracts. The output of these routines is called Candidate Index Terms. These word strings are matched by computer against an internal file of manually screened and cross-referenced terms called a Natural Language Data Base (NLDB). The NLDB differs from a standard thesaurus in that there is no related term category. Word strings which match the NLDB are accepted as valid MAI output. The mismatches are manually screened for suitability. Those accepted are added to the NLDB. If now, the original set of Candidate Index Terms is matched against the updated NLDB, the matched output is unedited MAI. If both the unedited matches and mismatches are further structured in accession order and sent to technical analysts for review, the output of that process is called edited MAI.The tests were designed to (a) compare unedited MAI with manual indexing, holding the indexing language and the retrieval technique constant; (b) compare edited MAI with unedited MAI, holding both the indexing and the retrieval technique constant; and (c) compare two different retrieval techniques, called simple and complex, while holding the indexing constant.  相似文献   

11.
In a dynamic retrieval system, documents must be ingested as they arrive, and be immediately findable by queries. Our purpose in this paper is to describe an index structure and processing regime that accommodates that requirement for immediate access, seeking to make the ingestion process as streamlined as possible, while at the same time seeking to make the growing index as small as possible, and seeking to make term-based querying via the index as efficient as possible. We describe a new compression operation and a novel approach to extensible lists which together facilitate that triple goal. In particular, the structure we describe provides incremental document-level indexing using as little as two bytes per posting and only a small amount more for word-level indexing; provides fast document insertion; supports immediate and continuous queryability; provides support for fast conjunctive queries and similarity score-based ranked queries; and facilitates fast conversion of the dynamic index to a “normal” static compressed inverted index structure. Measurement of our new mechanism confirms that in-memory dynamic document-level indexes for collections into the gigabyte range can be constructed at a rate of two gigabytes/minute using a typical server architecture, that multi-term conjunctive Boolean queries can be resolved in just a few milliseconds each on average even while new documents are being concurrently ingested, and that the net memory space required for all of the required data structures amounts to an average of as little as two bytes per stored posting, less than half the space required by the best previous mechanism.  相似文献   

12.
A theory of indexing helps explain the nature of indexing, the structure of the vocabulary, and the quality of the index. Indexing theories formulated by Jonker, Heilprin, Landry and Salton are described. Each formulation has a different focus. Jonker, by means of the Terminological and Connective Continua, provided a basis for understanding the relationships between the size of the vocabulary, the hierarchical organization, and the specificity by which concepts can be described. Heilprin introduced the idea of a search path which leads from query to document. He also added a third dimension to Jonker's model; the three variables are diffuseness, permutivity and hierarchical connectedness. Landry made an ambitious and well conceived attempt to build a comprehensive theory of indexing predicated upon sets of documents, sets of attributes, and sets of relationships between the two. It is expressed in theorems and by formal notation. Salton provided both a notational definition of indexing and procedures for improving the ability of index terms to discriminate between relevant and nonrelevant documents. These separate theories need to be tested experimentally and eventually combined into a unified comprehensive theory of indexing.  相似文献   

13.
This paper describes a technique for automatic book indexing. The technique requires a dictionary of terms that are to appear in the index, along with all text strings that count as instances of the term. It also requires that the text be in a form suitable for processing by a text formatter. A program searches the text for each occurrence of a term or its associated strings and creates an entry to the index when either is found. The results of the experimental application to a portion of a book text are presented, including measures of precision and recall, with precision giving the ratio of terms correctly assigned in the automatic process to the total assigned, and recall giving the ratio of correct terms automatically assigned to the total number of term assignments according to a human standard. Results indicate that the technique can be applied successfully, especially for texts that employ a technical vocabulary and where there is a premium on indexing exhaustivity.  相似文献   

14.
15.
Measuring effectiveness of information retrieval (IR) systems is essential for research and development and for monitoring search quality in dynamic environments. In this study, we employ new methods for automatic ranking of retrieval systems. In these methods, we merge the retrieval results of multiple systems using various data fusion algorithms, use the top-ranked documents in the merged result as the “(pseudo) relevant documents,” and employ these documents to evaluate and rank the systems. Experiments using Text REtrieval Conference (TREC) data provide statistically significant strong correlations with human-based assessments of the same systems. We hypothesize that the selection of systems that would return documents different from the majority could eliminate the ordinary systems from data fusion and provide better discrimination among the documents and systems. This could improve the effectiveness of automatic ranking. Based on this intuition, we introduce a new method for the selection of systems to be used for data fusion. For this purpose, we use the bias concept that measures the deviation of a system from the norm or majority and employ the systems with higher bias in the data fusion process. This approach provides even higher correlations with the human-based results. We demonstrate that our approach outperforms the previously proposed automatic ranking methods.  相似文献   

16.
An ordering system for a global information network is necessary in order to enable the user to retrieve the particular information he is looking for. Classification has been one of the methods of ordering. The principle of traditional classification has been based on the idea of partitioning the universe of knowledge in mutually exclusive classes, i.e. subjects. A particular topic is defined by narrower classification within a class following the principle of ‘genusspecies’ relationship. Ranganathan's system of faceted classification has only replaced the classification of terms into subjects and sub-subjects by classification of terms into five ambiguous categories. Taube's system of coordinate indexing gives full freedom to the user to combine any number of terms of his choice. To be effective for social sciences such a system has to overcome some difficult problems of semantics. The system MANIS described here maintains the traditional classification and yet allows the user to combine terms of his choice, where the choice is restricted to the terms belonging to the system of traditional classification.  相似文献   

17.
Backtrack4是功能最全面的一款测试平台,但由于数据交换处理机制的缺失使得它难以胜任高效的测试需求.设计了相应的数据格式支持机制,并依此开发了一个渗透测试框架(PTF).该框架会自动使用有关的渗透测试工具进行信息探测、漏洞评估、报告生成.真实网络环境中的实验验证了PTF能高效完成自动化渗透测试,进而大幅提升了使用Backtrack4进行渗透测试的有效性.  相似文献   

18.
Does human intellectual indexing have a continuing role to play in the face of increasingly sophisticated automatic indexing techniques? In this two-part essay, a computer scientist and long-time TREC participant (Pérez-Carballo) and a practitioner and teacher of human cataloging and indexing (Anderson) pursue this question by reviewing the opinions and research of leading experts on both sides of this divide. We conclude that human analysis should be used on a much more selective basis, and we offer suggestions on how these two types of indexing might be allocated to best advantage. Part one of the essay critiques the comparative research, then explores the nature of human analysis of messages or texts and efforts to formulate rules to make human practice more rigorous and predictable. We find that research comparing human vs automatic approaches has done little to change strongly held beliefs, in large part because many associated variables have not been isolated or controlled.Part II focuses on current methods in automatic indexing, its gradual adoption by major indexing and abstracting services, and ways for allocating human and machine approaches. Overall, we conclude that both approaches to indexing have been found to be effective by researchers and searchers, each with particular advantages and disadvantages. However automatic indexing has the over-arching advantage of decreasing cost, as human indexing becomes ever more expensive.  相似文献   

19.
Direct end-user data entry and retrieval is a major factor in achieving an economical information retrieval system. To be effective, such a system would have to provide a thesaurus structure which leads novice end-users to browse subject areas before retrieval and yet provides control and coverage of terms in a domain. A faceted hierarchical thesaurus organization has been designed to accomplish this goal.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号