首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Technology transfer, research and development and engineering projects frequently require in-depth literature reviews. These reviews are carried out using computerized, bibliographic data bases. The review and/or searching process involves keywords selected from data base thesauri. The search strategy is formulated to provide both breadth and depth of coverage and yields both relevant and nonrelevant citations. Experience indicates that about 10–20% of the citations are relevant. As a consequence, significant amounts of time are required to eliminate the nonrelevant citations. This paper describes statistically based, lexical association methods which can be employed to determine citation relevance. In particular, the searcher selects relevant terms from citation-derived indexes and this information along with lexical statistics is used to determine citation relevance. Preliminary results are encouraging with the techniques providing an effective concentration of relevant citations.  相似文献   

2.
吕美香 《情报科学》2012,(8):1160-1166
词表是图书馆和信息检索领域最重要的知识组织工具,《中国分类主题词表》是传统词表的一种,它的更新和维护一直依靠手工进行,这制约了它在数字图书馆和网络信息环境下的应用。本文介绍了一项基于统计的、从元数据的标题中抽取关键词并定位在词表中的方法。大致包括三个步骤:从标题中提取关键词;确定抽取出的关键词的专指度;将专指度高的专业词汇定位在词表中。在《中国分类主题词表》和上海图书馆提供的计算机科技领域的元数据上所进行实验,结果证明该方法是可行的。这一方法可以应用到自动标引或编目中,有一定的实用性和广阔的应用前景。  相似文献   

3.
Term classifications and thesauri can be used for many purposes in automatic information retrieval. Normally a thesaurus is generated manually by subject experts: alternatively, the associations between the terms can be obtained automatically by using the occurrence characteristics of the terms across the documents of a collection. A third possibility consists in taking into account user relevance assessments of certain documents with respect to certain queries in order to build term classes designed to retrieve the relevant documents and simultaneously to reject the nonrelevant documents. This last strategy, known as pseudoclassification, produces a user-dependent term classification.A number of pseudoclassification studies are summarized in the present report, and conclusions are reached concerning the effectiveness and feasibility of constructing term classifications based on human relevance assessments.  相似文献   

4.
Decisions in thesaurus construction and use   总被引:1,自引:0,他引:1  
A thesaurus and an ontology provide a set of structured terms, phrases, and metadata, often in a hierarchical arrangement, that may be used to index, search, and mine documents. We describe the decisions that should be made when including a term, deciding whether a term should be subdivided into its subclasses, or determining which of more than one set of possible subclasses should be used. Based on retrospective measurements or estimates of future performance when using thesaurus terms in document ordering, decisions are made so as to maximize performance. These decisions may be used in the automatic construction of a thesaurus. The evaluation of an existing thesaurus is described, consistent with the decision criteria developed here. These kinds of user-focused decision-theoretic techniques may be applied to other hierarchical applications, such as faceted classification systems used in information architecture or the use of hierarchical terms in “breadcrumb navigation”.  相似文献   

5.
Direct end-user data entry and retrieval is a major factor in achieving an economical information retrieval system. To be effective, such a system would have to provide a thesaurus structure which leads novice end-users to browse subject areas before retrieval and yet provides control and coverage of terms in a domain. A faceted hierarchical thesaurus organization has been designed to accomplish this goal.  相似文献   

6.
米佳 《现代情报》2009,29(1):38-41
本文对叙词表向本体的转换做了综合性的讨论,并提出了一种基于概念的叙词表转换方法,从而实现叙词表的RDF/OWL描述。  相似文献   

7.
在重建字顺表及其词族表的基础上,从词量规模、著录形态和概念控制3个角度,量化评价了ASIS、LISA、LISTA和LIBLIT4部图情学主题词表,归纳各项评价指标,LISTA主题词表的综合性能较其它3个词表性能最佳。  相似文献   

8.
This paper describes a study using the citation analysis technique to select journals that would be used in the livestock industry. The study determines the principal journals to which a livestock library should subscribe, thus obtaining the highest possible utility of materials.By using a data base of 114 journals for a period of four years (1980–1983), citation data were applied on the Bradford bibliograph and Bradford-Zipf distribution to determine the ranking of journals in the industry and the “core journals.”It was discovered that the Journal of Animal Science is the most cited journal, with 889 citations, and the core journals were 18 in number, having 11,070 citations representing 32.3% of the total citations.  相似文献   

9.
Knowledge organization (KO) and bibliometrics have traditionally been seen as separate subfields of library and information science, but bibliometric techniques make it possible to identify candidate terms for thesauri and to organize knowledge by relating scientific papers and authors to each other and thereby indicating kinds of relatedness and semantic distance. It is therefore important to view bibliometric techniques as a family of approaches to KO in order to illustrate their relative strengths and weaknesses. The subfield of bibliometrics concerned with citation analysis forms a distinct approach to KO which is characterized by its social, historical and dynamic nature, its close dependence on scholarly literature and its explicit kind of literary warrant. The two main methods, co-citation analysis and bibliographic coupling represent different things and thus neither can be considered superior for all purposes. The main difference between traditional knowledge organization systems (KOSs) and maps based on citation analysis is that the first group represents intellectual KOSs, whereas the second represents social KOSs. For this reason bibliometric maps cannot be expected ever to be fully equivalent to scholarly taxonomies, but they are – along with other forms of KOSs – valuable tools for assisting users’ to orient themselves to the information ecology. Like other KOSs, citation-based maps cannot be neutral but will always be based on researchers’ decisions, which tend to favor certain interests and views at the expense of others.  相似文献   

10.
When consumers search for health information, a major obstacle is their unfamiliarity with the medical terminology. Even though medical thesauri such as the Medical Subject Headings (MeSH) and related tools (e.g., the MeSH Browser) were created to help consumers find medical term definitions, the lack of direct and explicit integration of these help tools into a health retrieval system prevented them from effectively achieving their objectives. To explore this issue, we conducted an empirical study with two systems: One is a simple interface system supporting query-based searching; the other is an augmented system with two new components supporting MeSH term searching and MeSH tree browsing. A total of 45 subjects were recruited to participate in the study. The results indicated that the augmented system is more effective than the simple system in terms of improving user-perceived topic familiarity and question–answer performance, even though we did not find users spend more time on the augmented system. The two new MeSH help components played a critical role in participants’ health information retrieval and were found to allow them to develop new search strategies. The findings of the study enhanced our understanding of consumers’ search behaviors and shed light on the design of future health information retrieval systems.  相似文献   

11.
12.
Researchers in indexing and retrieval systems have been advocating the inclusion of more contextual information to improve results. The proliferation of full-text databases and advances in computer storage capacity have made it possible to carry out text analysis by means of linguistic and extra-linguistic knowledge. Since the mid 80s, research has tended to pay more attention to context, giving discourse analysis a more central role. The research presented in this paper aims to check whether discourse variables have an impact on modern information retrieval and classification algorithms. In order to evaluate this hypothesis, a functional framework for information analysis in an automated environment has been proposed, where the n-grams (filtering) and the k-means and Chen’s classification algorithms have been tested against sub-collections of documents based on the following discourse variables: “Genre”, “Register”, “Domain terminology”, and “Document structure”. The results obtained with the algorithms for the different sub-collections were compared to the MeSH information structure. These demonstrate that n-grams does not appear to have a clear dependence on discourse variables, though the k-means classification algorithm does, but only on domain terminology and document structure, and finally Chen’s algorithm has a clear dependence on all of the discourse variables. This information could be used to design better classification algorithms, where discourse variables should be taken into account. Other minor conclusions drawn from these results are also presented.  相似文献   

13.
A hybrid text/citation-based method is used to cluster journals covered by the Web of Science database in the period 2002–2006. The objective is to use this clustering to validate and, if possible, to improve existing journal-based subject-classification schemes. Cross-citation links are determined on an item-by-paper procedure for individual papers assigned to the corresponding journal. Text mining for the textual component is based on the same principle; textual characteristics of individual papers are attributed to the journals in which they have been published. In a first step, the 22-field subject-classification scheme of the Essential Science Indicators (ESI) is evaluated and visualised. In a second step, the hybrid clustering method is applied to classify the about 8300 journals meeting the selection criteria concerning continuity, size and impact. The hybrid method proves superior to its two components when applied separately. The choice of 22 clusters also allows a direct field-to-cluster comparison, and we substantiate that the science areas resulting from cluster analysis form a more coherent structure than the “intellectual” reference scheme, the ESI subject scheme. Moreover, the textual component of the hybrid method allows labelling the clusters using cognitive characteristics, while the citation component allows visualising the cross-citation graph and determining representative journals suggested by the PageRank algorithm. Finally, the analysis of journal ‘migration’ allows the improvement of existing classification schemes on the basis of the concordance between fields and clusters.  相似文献   

14.
雷晓  常春  刘伟 《情报科学》2021,39(1):135-141
【目的/意义】为保证叙词表术语收录的完整性,需要及时将领域出现但未收录的新术语补充收录到叙词表 中,结合候选词的时间及文档词频特征,从时间序列角度探索新术语的分布情况以指导新术语遴选是值得研究的 问题。【方法/过程】文章主要对词汇文档词频对应的时间序列进行研究,将时间序列进行词频归一化及时间等长预 处理,引入k-means聚类算法,对候选词汇进行基于时间序列趋势变化的聚类,探索术语以及非术语趋势变化的规 律,进而总结新术语应该满足的趋势变化特征。【结果/结论】通过聚类研究,总结得出新术语普遍处于增长趋势。 实证将处于增长状态的候选词汇遴选出来,经过专家判断,该方法可以有效从候选词汇中遴选出其中能补充到叙 词表中的新术语,该方法有比较高的准确率。【创新/局限】创新之处表现为叙词表新术语的遴选中同时考虑了时间 变化和文档词频因素,局限于数据处理规模,实证中只统计了论文关键词的词频数据。  相似文献   

15.
This article reveals different patterns of scholarly communication in the XML research field on the Web and in print journals in terms of author visibility, and challenges the common practice of exclusively using the ISI’s databases to obtain citation counts as scientific performance indicators. Results from this study demonstrate both the importance and the feasibility of the use of multiple citation data sources in citation analysis studies of scholarly communication, and provide evidence for a developing “two tier” scholarly communication system.  相似文献   

16.
The primary goal of this study was to carry out an ego-centric citation and reference analysis of the works of the mathematician and computer scientist, Michael O. Rabin. Until recently only a single citation database was available for such research – the ISI Citation Indexes. In this study we utilized and compared three major sources that provide citation data: the Web of Science, Google Scholar and Citeseer. Most cited works, citation identity, citation image makers and coauthors were identified. The citation image makers acquired through these sources differ considerably. Advantages and shortcomings of each of the tools are discussed in the context of computer science. A major issue in computer science is multiple manifestations of a work, i.e., its publication in several venues (technical reports, proceedings, journals, collections). The implications of multiple manifestations for citation analysis are discussed.  相似文献   

17.
The rate of citation duplication was examined in three databases: MEDLINE, BIOSIS, and LIFE SCIENCES COLLECTION. Duplicate citations were found to be more pertinent than unique citations. The duplicate citations came from a highly compact literature, while those from a single database were very widely scattered. The pertinent duplicated citations were more likely to be retrieved in searches that had more terms overall, had a higher percentage of thesaurus terms, and had terms which appeared in both title and abstract. These results suggest that the rate of duplication of citations in multidatabase searches may be used to rank output according to probable pertinence.  相似文献   

18.
In this paper, we propose a new algorithm, which incorporates the relationships of concept-based thesauri into the document categorization using the k-NN classifier (k-NN). k-NN is one of the most popular document categorization methods because it shows relatively good performance in spite of its simplicity. However, it significantly degrades precision when ambiguity arises, i.e., when there exist more than one candidate category to which a document can be assigned. To remedy the drawback, we employ concept-based thesauri in the categorization. Employing the thesaurus entails structuring categories into hierarchies, since their structure needs to be conformed to that of the thesaurus for capturing relationships between categories. By referencing various relationships in the thesaurus corresponding to the structured categories, k-NN can be prominently improved, removing the ambiguity. In this paper, we first perform the document categorization by using k-NN and then employ the relationships to reduce the ambiguity. Experimental results show that this method improves the precision of k-NN up to 13.86% without compromising its recall.  相似文献   

19.
This paper reviews some aspects of the relationship between the large and growing fields of machine learning (ML) and information retrieval (IR). Learning programs are described along several dimensions. One dimension refers to the degree of dependence of an ML + IR program on users, thesauri, or documents. This paper emphasizes the role of the thesaurus in ML + IR work. ML + IR programs are also classified in a dimension that extends from knowledge-sparse learning at one end to knowledge-rich learning at the other. Knowledge-sparse learning depends largely on user yes-no feedback or on word frequencies across documents to guide adjustments in the IR system. Knowledge-rich learning depends on more complex sources of feedback, such as the structure within a document or thesaurus, to direct changes in the knowledge bases on which an intelligent IR system depends. New advances in computer hardware make the knowledge-sparse learning programs that depend on word occurrences in documents more practical. Advances in artificial intelligence bode well for knowledge-rich learning.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号