首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Information Retrieval from Documents: A Survey   总被引:4,自引:0,他引:4  
Given the phenomenal growth in the variety and quantity of data available to users through electronic media, there is a great demand for efficient and effective ways to organize and search through all this information. Besides speech, our principal means of communication is through visual media, and in particular, through documents. In this paper, we provide an update on Doermann's comprehensive survey (1998) of research results in the broad area of document-based information retrieval. The scope of this survey is also somewhat broader, and there is a greater emphasis on relating document image analysis methods to conventional IR methods.Documents are available in a wide variety of formats. Technical papers are often available as ASCII files of clean, correct, text. Other documents may only be available as hardcopies. These documents have to be scanned and stored as images so that they may be processed by a computer. The textual content of these documents may also be extracted and recognized using OCR methods. Our survey covers the broad spectrum of methods that are required to handle different formats like text and images. The core of the paper focuses on methods that manipulate document images directly, and perform various information processing tasks such as retrieval, categorization, and summarization, without attempting to completely recognize the textual content of the document. We start, however, with a brief overview of traditional IR techniques that operate on clean text. We also discuss research dealing with text that is generated by running OCR on document images. Finally, we also briefly touch on the related problem of content-based image retrieval.  相似文献   

2.
Access to health care machine-readable data files (MRDF) is becoming increasingly important to students and researchers in the health care field who use the data in secondary analysis. Health sciences libraries must play a role in providing such access, and this role should consist primarily in providing users with information about the identity and contents of available MRDF and about how they may be obtained. Libraries should therefore collect extensive materials containing information about the MRDF that may be of interest to their users. Many such materials are available in print, and their quality may be expected to improve as newly developed methods and procedures for constructing bibliographic citations, abstracts, and catalog entries for MRDF are put into practice. Also, it is now feasible to incorporate data file abstracts into existing online bibliographic databases.  相似文献   

3.
Given the recent trend in bibliometrics and information science to use increasingly complex statistical methods, it is necessary to have powerful toolboxes to work with data from Web of Science (Thomson Reuters). We developed such a toolbox with four specific commands for the statistical software package Stata. These commands refer to (1) the import of downloads from Web of Science to Stata, (2) the preprocessing of address information from authors of publications in the downloaded set, (3) the geocoding of address information, and (4) the calculation of the minimum and maximum distance between several co-authors of a single paper. An advantage of developing commands for an established and comprehensive statistical software package (like Stata) is that a large number of further commands are available for the analysis of bibliometric data. We will describe some of these useful commands as well.  相似文献   

4.
大学图书馆在一校多馆开放借阅模式下,实体文献馆藏状态呈动态变化,导致OPAC馆藏信息存在失真现象。文章阐述了实体文献馆藏状态与OPAC馆藏信息的关系,分析了实体文献对OPAC馆藏信息的影响因素,从读者教育、馆员职业素养及汇文管理系统模块开发几方面提出了改进措施和意见。  相似文献   

5.
政府信息公开的目的在于利用,文章在调查高校用户这一知识结构、参政意识、信息需求、信息技能都较高的特殊人群对政府信息利用情况的基础上,深入分析当前政府信息利用中存在的问题,阐释其根源,最后从转变观念、拓宽渠道、加强为弱势群体服务、打造易用平台等方面给出建议。  相似文献   

6.
Although literature on rural libraries is abundant, there is a severe shortage of literature on the information needs of rural populations. This article presents an analysis of 33 studies on rural information needs identified from LISA–PLUS and the findings of a study of the information needs of the population of a cluster of three Malaysian villages with no library service. A total of 108 individuals from approximately 300 households were interviewed during February 1996. All the respondents are literate and show a strong interest in reading. Their top five information needs relate to: (1) Religious information; (2) Family bonding; (3) Current affairs; (4) Health information; and (5) Education. The top five purposes for seeking information were: (1) Fulfilment of the need to know; (2) Solving problems; (3) Self-development; (4) Establishing a better family; and (5) Work purposes. The top five sources of information were: (1) TV/Radio; (2) Friends/neighbours; (3) Printed materials; (4) Relatives from the city; and (5) the School (library). Should a library service be made available, 93.054% would be interested in using it. The results emphasize that the needs of the rural population must be carefully investigated when planning rural library services.  相似文献   

7.
This article reports a librarian's collection development efforts to support her institution's first independent doctoral degree (Ed.D., Educational Leadership) via a citation analysis comparing information usage by education doctorate dissertation authors from six peer institutions nationally. This analysis is part of a long-term examination of library collection use among California State University, Long Beach (CSULB) doctoral students. Key findings include the relative young age among the information resources in educational leadership, which resource formats were cited, what serial titles were cited the most, and where they are available electronically. The ultimate aim is the creation of an essential collection in the subject discipline.  相似文献   

8.
互联网传播时代的海量信息处理,对于教育、研究和科技工作者来说既是福音又是苦恼。一方面,当前科技文献的种类比传统图书馆能够提供的服务更加丰富;而另一方面,如何剔除随信息量增加的噪声信息,以求获取效率和质量上的平衡则是一个需要攻克的课题。图情服务机构利用图情有序组织的方法,针对海量的资源进行序化的组织和管理;结合教育、研究科技工作者的特点,利用科技信息分析的方法,在充分考虑我国国情的前提下最有效率地整合资源,并结合技术手段来提高科技信息资源使用的效率和质量。这对于中国教育、研究、科技工作者最大化利用科技文献来开展研究和工作,是最好的选择之一。  相似文献   

9.
10.
As the People's Republic of China plays an increasingly important role in international politics and trade, countries with economic interests there find they need to know more about this nation. Access to primary information sources, including official statistics from China, however, is very limited, as little exploration has been done into this closely controlled repository of information. This study explores major current statistical sources in China through examining (a) the statistical system in Chinese government, (b) the mechanism of statistical data collection, and (c) what statistical information is currently available in both print and electronic format and at what level. It shows that a wealth of statistical information does exist in China, it is systematically compiled, and it is available, although not conveniently, to the public through various channels. This study can serve the need for China's data from the academic and business communities, contribute to a better understanding of China's statistical system, and serve as a collection tool for academic, public, and corporation libraries as well.  相似文献   

11.
12.
政府信息公开的目的在于利用,文章在调查高校用户这一知识结构、参政意识、信息需求、信息技能都较高的特殊人群对政府信息利用情况的基础上,深入分析当前政府信息利用中存在的问题,阐释其根源,最后从转变观念、拓宽渠道、加强为弱势群体服务、打造易用平台等方面给出建议。  相似文献   

13.
14.
Providing better service by automating “business processes” is an exciting prospect for improving the government. Yet, there has not been the same level of effort at making it easier for the public to obtain information about what its government is doing. This article focuses on the constraints and opportunities in making database information available to the public. The database technology is chosen because it is a central repository of public information. New federal law requires the use of information technology (IT) to make access to public information easier. But the new law has also subtly shifted the burden of proof to the citizen in showing why certain information should be made available. If a “statutory fix” to this problem is not available in the short run, we urge agencies to provide increased access to database information because of the continual development of technology and its effect on citizen expectations.  相似文献   

15.
The article discusses the Slavic Cataloging Manual (SCM), available on the World Wide Web since 1994. The SCM contains a great deal of valuable information for all aspects of cataloging materials in Slavic and East European languages and in the non‐Slavic languages of the former Soviet Union, as well as equally valuable information for cataloging materials about the area. The manual offers especially detailed guidance on heading construction and subject analysis for the dissolved unions of the former USSR, Czechoslovakia, and Yugoslavia. This is a valuable resource for anyone involved in Slavic cataloging, both the experienced cataloger and the novice.  相似文献   

16.
Searching for information pervades a wide spectrum of human activity, including learning and problem solving. With recent changes in the amount of information available and the variety of means of retrieval, there is even more need to understand why some searchers are more successful than others. This study was undertaken to advance the understanding of expertise in seeking information on the Web by identifying strategies and attributes that will increase the chance of a successful search on the Web. The strategies were as follows: evaluation, navigation, affect, metacognition, cognition, and prior knowledge, and attributes included age, sex, years of experience, computer knowledge, and info-seeking knowledge. Success was defined as finding a target topic within 30 minutes. Participants were from three groups. Novices were 10 undergraduate pre-service teachers, intermediates were 9 final-year master of library and information studies students, and experts were 10 highly experienced professional librarians working in a variety of settings. Participants' verbal protocols were transcribed verbatim into a text file and coded. These codes, along with Internet temporary files, a background questionnaire, and a post-task interview were the sources of the data. Since the variable of interest was the time to finding the topic, in addition to ANOVA and Pearson correlation, survival analysis was used to explore the data. The most significant differences in patterns of search between novices and experts were found in the cognitive, metacognitive, and prior knowledge strategies. Survival analysis revealed specific actions associated with success in Web searching: (1) using clear criteria to evaluate sites, (2) not excessively navigating, (3) reflecting on strategies and monitoring progress, (4) having background knowledge about information seeking, and (5) approaching the search with a positive attitude.  相似文献   

17.
The Drug Information Portal is a free Web resource from the National Library of Medicine (NLM) that provides a user-friendly gateway to current information for more than 15,000 drugs. The site guides users to related resources of NLM, the National Institutes of Health (NIH), and other government agencies. Current drug-related information regarding consumer health, clinical trials, AIDS, MeSH pharmacological actions, MEDLINE/PubMed biomedical literature, and physical properties and structure is easily retrieved by searching on a drug name. A varied selection of focused topics in medicine and drugs is also available from displayed subject headings. This column provides background information about the Drug Information Portal, as well as search basics.  相似文献   

18.
19.
Background: The approach of evidence‐based medicine (EBM), providing a paradigm to validate information sources and a process for critiquing their value, is an important platform for guiding practice. Researchers have explored the application and value of information sources in clinical practice with regard to a range of health professions; however, naturopathic practice has been overlooked. Objectives: An exploratory study of naturopaths’ perspectives of the application and value of information sources has been undertaken. Methods: Semi‐structured interviews with 12 naturopaths in current clinical practice, concerning the information sources used in clinical practice and their perceptions of these sources. Results: Thematic analysis identified differences in the application of the variety of information sources used, depending upon the perceived validity. Internet databases were viewed as highly valid. Textbooks, formal education and interpersonal interactions were judged based upon a variety of factors, whilst validation of general internet sites and manufacturers information was required prior to use. Conclusions: The findings of this study will provide preliminary aid to those responsible for supporting naturopaths’ information use and access. In particular, it may assist publishers, medical librarians and professional associations in developing strategies to expand the clinically useful information sources available to naturopaths.  相似文献   

20.
In carrying out prior art searching, the European Patent Office (EPO) is obliged to take into account all information available in the public domain up to the date of filing of a patent application. In order to do this effectively, it is necessary to have rapid, selective, and comprehensive access to all information relevant to a topic being searched. In recent years, the method of searching in the EPO has been moving from a predominantly paper-based approach to one relying on handling information in electronic format. For such a change to be possible, it is necessary to have electronic access to all relevant information as either primary or secondary data. The means of searching electronic data is the Epoque system. The progress toward development of a uniform approach to the handling of patent and non-patent data in both character-coded and facsimile format is described.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号