首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Learning Algorithms for Keyphrase Extraction   总被引:20,自引:0,他引:20  
Many academic journals ask their authors to provide a list of about five to fifteen keywords, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a wide variety of tasks for which keyphrases are useful, as we discuss in this paper. We approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. We evaluate the performance of nine different configurations of C4.5. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for automatically extracting keyphrases from text. The experimental results support the claim that a custom-designed algorithm (GenEx), incorporating specialized procedural domain knowledge, can generate better keyphrases than a general-purpose algorithm (C4.5). Subjective human evaluation of the keyphrases generated by GenEx suggests that about 80% of the keyphrases are acceptable to human readers. This level of performance should be satisfactory for a wide variety of applications.  相似文献   

In this paper, we present a framework that can process a user query for retrieval of information from documents of different properties across multiple domains, with specific application to patent laws and regulations. The framework has three basic components. The first component is ontology mapping and generation. What happens is that the keywords entered by users are mapped into a subset of relevant keywords. This step is performed by looking up those words in an ontology database. The second component is the joint and cross search in various document domains; in our case, they are patents and scientific publications. The last component is to modify the search results by applying user feedback statistics. The results of feedback will be saved as metadata for future uses.A case example is given to demonstrate how results from multiple domain searches can be combined using ontology and cross referencing. We use an example of well-known biotechnology patents on erythropoietin (EPO) and give detailed analysis on each document domain with this keyword. Relationships between each domain are demonstrated.A user feedback mechanism is also discussed in this paper. The ability to take user feedback into the framework is important. There is no doubt that domain knowledge from expert or experienced users could be a very good compliment to the proposed system. Both direct and indirect user feedbacks are discussed.  相似文献   

为有效利用元数据来增强电子政务公文系统的可操作性、可移植性、可扩展性、可维护性和数据一致性,首先简单论述元数据的定义及其在电子政务公文系统中的一般作用,然后以“辽宁省大连市委办公系统软件工程中的公文系统”开发实践为例,从数据存储、数据交换、数据展现等几个方面多角度探讨元数据在电子政务公文系统中的应用和实现机制。  相似文献   

This article summarizes published documents on metadata provided by Google for books scanned as part of the Google Book Search (GBS) project and provides suggestions for improvement. The faulty, misleading, and confusing metadata in current Google records can pose potentially serious problems for users of GBS. Google admits that it took data, which proved to be inaccurate, from many sources and is attempting to correct errors. Some argue that metadata is not needed with keyword searching; but optical character recognition (OCR) errors, synonym control, and materials in foreign languages make reliable metadata a requirement for academic researchers. The authors recommend that users should be able to submit error reports to Google to correct faulty metadata.  相似文献   

新时期我国文献传递服务的发展现状及路向分析   总被引:4,自引:0,他引:4  
我国文献传递服务从上世纪20年代开始发展到现在,已呈现出欣欣向荣的发展态势。但在新时期逐步显现出发展不平衡、资源重复建设、缺少协调和评价、技术难突破等问题。为突破这些制约我国文献传递有效前行的瓶颈,新时期我国文献传递服务呈现出下列发展态势:直接面向用户的非中介式传递方式是发展趋势;地区—全国文献传递系统的建立是最佳选择;开辟特种文献及特色文献传递渠道是努力方向;产业化文献传递道路是必然归宿。  相似文献   

陈喆 《图书情报工作》2008,52(11):103-105
随着信息技术和资源环境的飞速发展,图书馆文献服务如何充分利用专业优势来满足用户多样性需求,通过探索新的服务形态和模式,争取成为开放的、社会信息服务链中重要的一环。结合近年来上海图书馆的实践案例,就文献服务的资源、馆员、内容、渠道和用户方面的融合趋势,以及个性化网络集成、文献知识的数据挖掘、嵌入用户信息系统的文献解决方案三种创新服务模式,提出了一些总结和思考。  相似文献   

CALIS三期e得门户为高校图书馆进行文献传递与用户文献获取提供了一条便捷途径。基于CALIS三期e得门户构建区域文献传递“共享域”,在整合区域高校图书馆文献传递系统的基础上,来实现用户对文献的便捷获取。作为一个应用范例,通过对文献传递服务的几种模式进行分析,指出在e得门户上对区域各高校馆的文献传递系统进行无缝集成是可行的。最后,对区域文献传递共享域建构所面临的主要问题进行分析并提出相应的解决措施。  相似文献   

我国办公自动化系统的管理元数据方案研究   总被引:1,自引:0,他引:1  
通过对国家档案局征求意见的两项电子文件管理元数据规范的比较分析,以及对我国电子政务已有的办公文档元数据与数据元标准的分析,说明我国电子文档管理元数据与电子办公文档管理元数据设计的缺陷,以及由此带来的后果.在此基础上,该文对我国办公自动化系统必须建设的管理元数据方案提出了4点建议:办公自动化系统通用管理元数据标准建设,办公自动化系统的元数据格式的规范化,电子文档管理元数据与电子文件管理元数据的关联与区别,办公自动化元数据方案制定应具有可执行性.本文图8个,表5个.  相似文献   

1文献增长的指数规律与文献总量据统计,我国大陆报纸总数至1992年已达1755种,比1978年增加1569种[1]。如果我们以1978年的186种为起点,那么,不难发现,基本上平均每5年报纸总数翻番,平均每3天就有一种报纸问世。但1978年以前的报纸...  相似文献   

本文认为对于档案馆开展的现行文件公开利用服务,文件生命周期理论和文件价值理论都难以为其提供理论依据.而用档案形成在前说来对现行文件公开利用解读,许多问题可以迎刃而解.并认为档案在形式、内容信息和功能上都与文件有着本质的区别,它们是两类完全不同的事物.这就决定现行文件公开不等于档案开放,现行文件以公开为原则,而档案则应有一个相对的封闭期.  相似文献   

本文总结"985"高校、"211"高校文献资源保障体系建设的经验和教训,基于某校"新诗所"文献资源保障体系建设实践,提出弱势学科文献资源保障体系的建设应以学科馆员牵头,采用"学科馆员+馆藏纸质资源+Internet资源导航+大型文献资源共享平台"的资源保障模式,并尽力完善"图书馆员-读者"、"读者-读者"之间的沟通渠道。  相似文献   

赵屹 《北京档案》2015,(1):19-22
从元数据的定义、元数据的作用、元数据与电子文件和背景信息的关系、元数据在电子文件管理中的使用四个方面对电子文件管理元数据进行研究和阐述。力求将研究性、知识性与易读性结合,有助于档案工作者了解元数据,进而依据元数据标准在实际工作中进行应用。  相似文献   

简帛文献学中文献辨伪观念和方法研究述评   总被引:1,自引:0,他引:1  
传统文献学界以静止不变的观念看待古书的形成和流传,并由此形成了一套文献辨伪方法,而简帛文献的出土则提供了古书形态和流传的真实图景,引起了学术界对于传统文献辨伪观念和方法的反思。文章述评了简帛文献学界关于文献辨伪观念和方法新的研究成果,并对文献学建设提出了一点建议。  相似文献   

《七略》中的核心文献   总被引:3,自引:0,他引:3  
在引入核心文献这一崭新概念的基础上,讨论了《七略》中核心文献所具有的层次性及其所反映的文化价值。认为具体的核心文献和抽象的分类类名相结合,维持了分类思维中感性和理性之间的辩证。最后分析了这一分类思维的时代基础。  相似文献   

Document length is widely recognized as an important factor for adjusting retrieval systems. Many models tend to favor the retrieval of either short or long documents and, thus, a length-based correction needs to be applied for avoiding any length bias. In Language Modeling for Information Retrieval, smoothing methods are applied to move probability mass from document terms to unseen words, which is often dependant upon document length. In this article, we perform an in-depth study of this behavior, characterized by the document length retrieval trends, of three popular smoothing methods across a number of factors, and its impact on the length of documents retrieved and retrieval performance. First, we theoretically analyze the Jelinek–Mercer, Dirichlet prior and two-stage smoothing strategies and, then, conduct an empirical analysis. In our analysis we show how Dirichlet prior smoothing caters for document length more appropriately than Jelinek–Mercer smoothing which leads to its superior retrieval performance. In a follow up analysis, we posit that length-based priors can be used to offset any bias in the length retrieval trends stemming from the retrieval formula derived by the smoothing technique. We show that the performance of Jelinek–Mercer smoothing can be significantly improved by using such a prior, which provides a natural and simple alternative to decouple the query and document modeling roles of smoothing. With the analysis of retrieval behavior conducted in this article, it is possible to understand why the Dirichlet Prior smoothing performs better than the Jelinek–Mercer, and why the performance of the Jelinek–Mercer method is improved by including a length-based prior.
Leif AzzopardiEmail:

韩宁  杨鸣放 《图书馆建设》2012,(3):47-48,51
在《文献主题标引规则》中,文献主题因素一般由主体因素、通用因素、空间因素、时间因素和文献类型因素构成。其中,文献类型因素是指文献主题中表示文献编撰形式、写作形式、内容深浅程度、用途等方面的概念。尽管文献类型因素只是文献标引的辅助标准,但对读者选择和利用文献具有重要的参考价值。在进行文献主题标引时,标引者应根据文献内容的深浅程度、阅读对象、编纂形式、写作形式、用途等具体情况选择恰当的文献类型表达形式。  相似文献   

拓展文献传递优化文献资源配置   总被引:7,自引:1,他引:7  
鄢珞青  李云华 《图书馆杂志》2006,25(8):30-32,45
随着书刊价格水涨船高。图书馆不得不对购置经费重新分配,大幅削减书刊品种与数量,致使馆藏骤减。同时,文献传递服务的规模与数量却快速发展,许多图书馆开始将购买原始文献的部分经费用于支付文献传递费用,来满足读者的文献需求。本文就如何拓展文献传递,优化文献资源配置进行讨论,并提出相关建议。  相似文献   

The collective feedback of the users of an Information Retrieval (IR) system has been shown to provide semantic information that, while hard to extract using standard IR techniques, can be useful in Web mining tasks. In the last few years, several approaches have been proposed to process the logs stored by Internet Service Providers (ISP), Intranet proxies or Web search engines. However, the solutions proposed in the literature only partially represent the information available in the Web logs. In this paper, we propose to use a richer data structure, which is able to preserve most of the information available in the Web logs. This data structure consists of three groups of entities: users, documents and queries, which are connected in a network of relations. Query refinements correspond to separate transitions between the corresponding query nodes in the graph, while users are linked to the queries they have issued and to the documents they have selected. The classical query/document transitions, which connect a query to the documents selected by the users’ in the returned result page, are also considered. The resulting data structure is a complete representation of the collective search activity performed by the users of a search engine or of an Intranet. The experimental results show that this more powerful representation can be successfully used in several Web mining tasks like discovering semantically relevant query suggestions and Web page categorization by topic.  相似文献   

文献推荐系统综述   总被引:1,自引:0,他引:1  
文献推荐系统帮助用户在海量文献环境下发现个性化的信息,已经成为文献检索系统的重要组成部分。文献推荐技术研究在信息检索、文献计量学与电子商务推荐系统研究成果综合演变下发展起来。首先讨论了一般个性化推荐技术;进一步对文献推荐技术已经取得的研究成果进行了系统的分析与总结;同时,介于评价测度与方法是推荐系统的重要组成部分,给出了常用的文献推荐系统的评价测度;最后,对文献推荐系统研究现状作出总体评价并指出将来的发展方向。  相似文献   

图书馆可以通过文献调查、网站访问统计、网站链接推荐、与用户信息互动等方式统计地方文献资源的利用率,提高地方文献资源利用率的有效手段主要有:整合地方文献资源,鼓励用户参与地方文献资源建设,宣传推广地方文献资源等。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号