首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This article presents the human evaluation of ILIAD, a program for machine-aided indexing (MAI). It consists of two language engineering modules and is designed to assist expert librarians in computer-aided indexing and document analysis. Our aim is the expert evaluation of automatic multi-word term indexing. Evaluation is performed by documentary engineers. Cataloging and indexing are their principal tasks. They also have a good scientific knowledge of the domain to which the indexed documents belong.We first present the ILIAD program and the two systems submitted to this evaluation, the methodology (protocol) adopted, the differences between the protocol and the implementation, and the results of these evaluations. Human evaluation is divided into three parts: firstly the evaluation of controlled indexing, then free indexing and finally term variant extraction performed during controlled indexing. Finally, we analyze the relevance of this evaluation by calculating the agreement frequency and the Kappa coefficient and propose some future developments.  相似文献   

2.
LIPHIS (Linked Phrase Indexing System) is a system of computer-assisted permuted subject indexing designed, like its precursor NEPHIS, to be economical and to be as easy as possible for the indexer, for the programmer, and for the user of the index. Unlike NEPHIS, LIPHIS is designed to handle more complex networks of concept relations, and so produce better indexing of highly detailed subjects.  相似文献   

3.
A comparative evaluation has been carried out on the Philips “DIRECT” and the British “INSPEC” retrieval system. DIRECT is based on automatic indexing whereas INSPEC uses manual subject indexing.Two queries were submitted to both systems, using the same data base. The results are expressed in terms of recall and precision. Both recall and precision of INSPEC were found to be higher than those of DIRECT by 20%. It is concluded that this is mainly a result of the query formulation. The effectiveness obtained with automatic indexing of documents is equivalent to that of the manual procedure.  相似文献   

4.
The Defense Documentation Center (DDC), a field activity of the Defense Supply Agency, implemented an automated indexing procedure in October 1973. This Machine-Aided Indexing (MAI) System [1] had been under development since 1969. The following is a report of several comparisons designed to measure the retrieval effectiveness of MAI and manual indexing procedures under normal operational conditions.Several definitions are required in order to clarify the MAI process as it pertains to these investigations. The MAI routines scan unedited text in the form of titles and abstracts. The output of these routines is called Candidate Index Terms. These word strings are matched by computer against an internal file of manually screened and cross-referenced terms called a Natural Language Data Base (NLDB). The NLDB differs from a standard thesaurus in that there is no related term category. Word strings which match the NLDB are accepted as valid MAI output. The mismatches are manually screened for suitability. Those accepted are added to the NLDB. If now, the original set of Candidate Index Terms is matched against the updated NLDB, the matched output is unedited MAI. If both the unedited matches and mismatches are further structured in accession order and sent to technical analysts for review, the output of that process is called edited MAI.The tests were designed to (a) compare unedited MAI with manual indexing, holding the indexing language and the retrieval technique constant; (b) compare edited MAI with unedited MAI, holding both the indexing and the retrieval technique constant; and (c) compare two different retrieval techniques, called simple and complex, while holding the indexing constant.  相似文献   

5.
In a dynamic retrieval system, documents must be ingested as they arrive, and be immediately findable by queries. Our purpose in this paper is to describe an index structure and processing regime that accommodates that requirement for immediate access, seeking to make the ingestion process as streamlined as possible, while at the same time seeking to make the growing index as small as possible, and seeking to make term-based querying via the index as efficient as possible. We describe a new compression operation and a novel approach to extensible lists which together facilitate that triple goal. In particular, the structure we describe provides incremental document-level indexing using as little as two bytes per posting and only a small amount more for word-level indexing; provides fast document insertion; supports immediate and continuous queryability; provides support for fast conjunctive queries and similarity score-based ranked queries; and facilitates fast conversion of the dynamic index to a “normal” static compressed inverted index structure. Measurement of our new mechanism confirms that in-memory dynamic document-level indexes for collections into the gigabyte range can be constructed at a rate of two gigabytes/minute using a typical server architecture, that multi-term conjunctive Boolean queries can be resolved in just a few milliseconds each on average even while new documents are being concurrently ingested, and that the net memory space required for all of the required data structures amounts to an average of as little as two bytes per stored posting, less than half the space required by the best previous mechanism.  相似文献   

6.
This paper describes a technique for automatic book indexing. The technique requires a dictionary of terms that are to appear in the index, along with all text strings that count as instances of the term. It also requires that the text be in a form suitable for processing by a text formatter. A program searches the text for each occurrence of a term or its associated strings and creates an entry to the index when either is found. The results of the experimental application to a portion of a book text are presented, including measures of precision and recall, with precision giving the ratio of terms correctly assigned in the automatic process to the total assigned, and recall giving the ratio of correct terms automatically assigned to the total number of term assignments according to a human standard. Results indicate that the technique can be applied successfully, especially for texts that employ a technical vocabulary and where there is a premium on indexing exhaustivity.  相似文献   

7.
8.
9.
Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in extracting semantically exact indexes that represent the semantic content of a document. To address this issue, we developed a new indexing formalism that considers not only the terms in a document, but also the concepts. In this approach, concept clusters are defined and a concept vector space model is proposed to represent the semantic importance degrees of lexical items and concepts within a document. Through an experiment on the TREC collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the few highest-ranked documents. Moreover, the index term dimension was 80% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment.  相似文献   

10.
Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each other. However, conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term dependence into a probabilistic retrieval model by adapting a dependency structured indexing system using a dependency parse tree and Chow Expansion to compensate the weakness of the assumption. In this paper, we describe a theoretic process to apply the Chow Expansion to the general probabilistic models and the state-of-the-art 2-Poisson model. Through experiments on document collections in English and Korean, we demonstrate that the incorporation of term dependences using Chow Expansion contributes to the improvement of performance in probabilistic IR systems.  相似文献   

11.
The paper describes a technique developed as automatic support to subject heading indexing at BIOSIS. The technique is based on the use of a formalized language for semantic representation of biological texts and subject headings—the language of Concept Primitives. The structure of the language is discussed as well as the structure of the Semantic Vocabulary, in which natural language words from biological texts are described by Concept Primitives. The Semantic Vocabulary is being constructed. Approximately 8,000 entries corresponding to high frequency significant words have been compiled, comprising at least three-quarters of the final number. Results of experiments checking the approach are given, and journal/subject heading and author/subject heading correlation data are analyzed to be used as a supporting technique.  相似文献   

12.
A procedure for automated indexing of pathology diagnostic reports at the National Institutes of Health is described. Diagnostic statements in medical English are encoded by computer into the Systematized Nomenclature of Pathology (SNOP). SNOP is a structured indexing language constructed by pathologists for manual indexing. It is of interest that effective automatic encoding can be based upon an existing vocabulary and code designed for manual methods. Morphosyntactic analysis, a simple syntax analysis, matching of dictionary entries consisting of several words, and synonym substitutions are techniques utilized.  相似文献   

13.
The profusion of online resources calls for tools and methods to help Internet users find precisely what they are looking for. Quality controlled gateway CISMeF provides such services for health resources. However, the human cost of maintaining and updating the catalogue are increasingly high. This paper presents the automatic indexing system currently developed in the CISMeF team to be used as such for preliminary indexing, or after human reviewing for the final indexing. The system architecture, using the INTEX platform for MeSH term extraction is detailed. The results of a first evaluation tend to indicate that the automatic indexing strategy is relevant, as it achieves a precision comparable to that of other existing operational systems. Moreover, the system presented in this paper retrieves keyword/qualifier pairs as opposed to single terms, therefore providing a significantly more precise indexing. Further development and tests will be carried out in order to improve the coverage of the dictionaries, and validate the efficiency of the system in the indexers’ everyday work.  相似文献   

14.
15.
Given the expanse of the Internet as a topic for research, the need for transdisciplinary research becomes evident. This paper introduces and expands on the problems of Internet research and how some of those can be resolved by pursuing transdisciplinary research. Issues introduced are the fragmentation of understanding, the disunity of research, and the public reception of that research.  相似文献   

16.

Given the expanse of the Internet as a topic for research, the need for transdisciplinary research becomes evident. This paper introduces and expands on the problems of Internet research and how some of those can be resolved by pursuing transdisciplinary research. Issues introduced are the fragmentation of understanding, the disunity of research, and the public reception of that research.  相似文献   

17.
18.
An ordering system for a global information network is necessary in order to enable the user to retrieve the particular information he is looking for. Classification has been one of the methods of ordering. The principle of traditional classification has been based on the idea of partitioning the universe of knowledge in mutually exclusive classes, i.e. subjects. A particular topic is defined by narrower classification within a class following the principle of ‘genusspecies’ relationship. Ranganathan's system of faceted classification has only replaced the classification of terms into subjects and sub-subjects by classification of terms into five ambiguous categories. Taube's system of coordinate indexing gives full freedom to the user to combine any number of terms of his choice. To be effective for social sciences such a system has to overcome some difficult problems of semantics. The system MANIS described here maintains the traditional classification and yet allows the user to combine terms of his choice, where the choice is restricted to the terms belonging to the system of traditional classification.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号