首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
Documents in computer-readable form can be used to provide information about other documents, i.e. those they cite.To do this efficiently requires procedures for computer recognition of citing statements. This is not easy, especially for multi-sentence citing statements. Computer recognition procedures have been developed which are accurate to the following extent: 73% of the words in statements selected by computer procedures as being citing statements are words which are correctly attributable to the corresponding documents.The retrieval effectiveness of computer-recognized citing statements was tested in the following way. First, for eight retrieval requests in inorganiic chemistry, average recall by search of Chemical Abstracts Service indexing and Chemical Abstracts abstract text words was found to be 50%. Words from citing statements referring to the papers to be retrieved were then added to the index terms and abstract words as additional access points, and searching was repeated. Average recall increased to 70%. Only words from citing statements published within a year of the cited papers were used.The retrieval effect of citing statement words alone (published within a year) without index or abstract terms was the following: average recall was 40%. When just the words of the titles of the cited papers were added to those citing statement words, average recall increased to 50%.  相似文献   

4.
A comparative evaluation has been carried out on the Philips “DIRECT” and the British “INSPEC” retrieval system. DIRECT is based on automatic indexing whereas INSPEC uses manual subject indexing.Two queries were submitted to both systems, using the same data base. The results are expressed in terms of recall and precision. Both recall and precision of INSPEC were found to be higher than those of DIRECT by 20%. It is concluded that this is mainly a result of the query formulation. The effectiveness obtained with automatic indexing of documents is equivalent to that of the manual procedure.  相似文献   

5.
MEDLINE is presented as a prototype for on-line bibliographic search systems. Creation of the data base, indexing language, and file organization are reviewed. On accessing the files, search logic is illustrated with a sample MEDLINE search. NLM's development of a document delivery system to complement its bibliographic retrieval system is discussed.  相似文献   

6.
7.
The vector space model (VSM) is a textual representation method that is widely used in documents classification. However, it remains to be a space-challenging problem. One attempt to alleviate the space problem is by using dimensionality reduction techniques, however, such techniques have deficiencies such as losing some important information. In this paper, we propose a novel text classification method that neither uses VSM nor dimensionality reduction techniques. The proposed method is a space efficient method that utilizes the first order Markov model for hierarchical Arabic text classification. For each category and sub-category, a Markov chain model is prepared based on the neighboring characters sequences. The prepared models are then used for scoring documents for classification purposes. For evaluation, we used a hierarchical Arabic text data collection that contains 11,191 documents that belong to eight topics distributed into 3-levels. The experimental results show that the Markov chains based method significantly outperforms the baseline system that employs the latent semantic indexing (LSI) method. That is, the proposed method enhances the F1-measure by 3.47%. The novelty of this work lies on the idea of decomposing words into sequences of characters, which found to be a promising approach in terms of space and accuracy. Based on our best knowledge, this is the first attempt to conduct research for hierarchical Arabic text classification with such relatively large data collection.  相似文献   

8.
Online data bases might be more valuable to users if their structures more closely matched those of the disciplines represented by the data bases. To explore this concept, a structure for the field of tropical medicine was derived from the interrelationships of signs and symptoms of 37 tropical diseases. A similar structure was derived for the interrelationships of sign and symptom index terms applied to articles on these topical diseases in the MEDLINE data base. The poor correlation of the two structures led to the suggestion that rigorous indexing of articles with sign and symptom index terms or check tags would enhance the usefulness of the data base. Similar studies could be envisioned for other disciplines and data bases.  相似文献   

9.
10.
Traditional approaches to information retrieval, based on automatic or manually constructed keywords, are inappropriate for certain desirable tasks in an intelligent information system. Obtaining simple answers to direct questions, a summary of an event sequence that could span multiple documents, and an update of recent developments in an ongoing event sequence are three examples of such tasks.In this paper, the SCISOR system is described. SCISOR illustrates the potential for increased recall and precision of stored information through the understanding in context of articles in its domain of corporate takeovers. A constrained form of marker passing is used to answer queries of the knowledge base posed in natural language. Among other desirable characteristics, this method of retrieval focuses search on likely candidates, and tolerates incomplete or incorrect input indices very well.  相似文献   

11.
朱伟伟 《现代情报》2011,31(8):109-114,129
为系统了解国内机构知识库研究现状与趋势,采用文献计量法、比较分析法等,以CSSCI收录的机构知识库来源文献和被引文献数据为基础,从来源文献、引文情况及被引情况3个角度,对载文情况、引文概况、引文语种、引文类型、作者情况、期刊情况、被引情况、被引成果等多个方面进行统计、分析,并通过对这些文献的阅读、关键词分析和国内外有关研究情况的比较,针对国内机构知识库研究的问题,提出了4项建议。  相似文献   

12.
周玉芳 《现代情报》2012,32(6):25-28,32
采用文献计量方法和关键词共现分析法,对被中国学术期刊全文数据库收录的核心期刊上发表的查新研究论文按发表时间、作者、高频关键词和研究内容进行统计分析。研究近21年来科技查新研究领域的现状、发展、热点和趋势。  相似文献   

13.
This paper reports on the underlying IR problems encountered when dealing with the complex morphology and compound constructions found in the Hungarian language. It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective. Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf idf approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.  相似文献   

14.
The Web is revolutionizing the entire scholarly communication process and changing the way that researchers exchange information. In this paper, we analyze two views of information production and use in computer-related research based on citation analysis of PDF and Postcript formatted publications on the Web using autonomous citation indexing (ACI), and a parallel citation analysis of the journal literature indexed by the Institute for Scientific Information (ISI) in SCISEARCH. Our goal is to establish a baseline profile of computer science “literature” as it appears in the published journals and as it appears on the publicly available Web. From this starting point, we hope to identify additional research areas dealing with information dissemination and citation practices in computer science and the utility of autonomous citation indexing on the Web as an adjunct to commercial indexing  相似文献   

15.
自动标引技术的回顾与展望   总被引:4,自引:0,他引:4  
张静 《现代情报》2009,29(4):221-225
本文论述了在目前全文检索广泛应用的背景下,自动标引的重要性;把近五十年发展起来的自动标引技术按照采用的理论依据,分为统计分析方法、语言分析方法、人工智能法和混合方法,并阐述了每类自动标引技术的特征及其优劣势;最后,总结分析了现有自动标引技术的不足,并对其发展前景做出展望。  相似文献   

16.
基于文献计量的我国科技查新研究状况分析   总被引:1,自引:0,他引:1  
采用文献计量学方法,对2002-2010年间重庆维普《中文科技期刊全文数据库》收录的我国科技查新研究论文进行了统计与分析,并就论文发表的年代、期刊、作者、机构及主题五个方面对我国科技查新的研究现状进行了探讨.  相似文献   

17.
Difficulties with distribution of scientific and technical information (STI) in the People's Republic of Bulgaria are described. The rapid increase in STI data bases has produced files of varying quality and completeness. Difficulty in obtaining copies of documents listed in these data bases, lack of information, notably nonbibliographic data bases, for industrial production rather than scientific specialists, and absence of abstracts reduce the usefulness of STI data bases. STI as a commercial product is expensive for a developing country, which is thus made information dependent, and participation in information systems may require duplicate entry of documents. The language barrier also decreases the use of STI. In spite of these problems CISTI is working to speed and expand its users' access to scientific and technical information.  相似文献   

18.
灰色文献在卫生科学中的应用   总被引:2,自引:0,他引:2  
阎宗林 《情报杂志》1992,11(1):62-66
灰色文献即特种文献资料,主要有报告、会议文献、学位论文等类型。通过对六种卫生科学期刊和两种数据库引用灰色文献情况的统计分析表明:灰色文献是卫生科研人员一种重要的情报源。期刊中,各类报告是被引用灰色文献的主要类型;数据库中,引用灰色文献所占比例较低,引用学位论文比例高于技术报告。为此,应重视灰色文献的有效开发和利用工作。  相似文献   

19.
夏立新  庄青青  陈卓群 《情报科学》2007,25(9):1378-1383
XML文档的置标语义信息舜口结构化特点,使检索更易于实现,且能改善检索时的查准率。本文利用二叉排序树为XML文档建立索引文件,给出了建立索引的数据结构舜口算法,并分析了二叉排序树索引在改善XML文档的数据更新,检索速度及查准率等方面的优势。  相似文献   

20.
[目的/意义]探索论文被引次数是否和论文内容即概念组合方式有关。[方法/过程]选取WoS数据库中的免疫学科,抽取其中高、中、低被引频次三种论文集合的主题词,分析各集合主题词频次分布的集中离散趋势。分别构建主题词共现网络,通过网络拓扑属性的分析,了解三种论文集合在概念组合方式上的异同,衡量非典型组合与新颖性的关系。[结果/结论](1)不同被引频次的文献集合在主题类型的分布和主题词分散程度上有较大差异。(2)高被引和中被引论文集的主题词共现网络具有小世界性,低被引论文集的主题词网络不具有小世界性。(3)高被引论文集的主题词共现网络比较紧密,且主题词非典型组合的比例要高于其他两种论文集。低被引论文集的主题词网络比较松散。论文的被引次数与其主题热度、主题之间联系密切程度以及主题之间组合方式相关。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号