首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
In this paper, we investigate two new reflectance and illumination decomposition models based on a nonlocal partial differential equation (PDE) applied to text images. Taking into consideration the higher regularity level of the illumination compared to the reflectance, we propose a nonlocal PDE which deals with repetitive structures and textures that characterize the text image much better compared to the classical local PDEs. The aim of this approach is to use the repetitive features of the reflectance to efficiently extract it from the non-uniform illumination. This idea is motivated by extending the range of application of the nonlocal operators to such a problem. Numerical experiments on both grayscale and color text images show the performance and strength of the proposed nonlocal PDE.  相似文献   

利用DEA对科研机构规模效益的分析   总被引:2,自引:0,他引:2       下载免费PDF全文
孟溦  黄敏  刘文斌 《科研管理》2006,27(4):20-25,19
利润最大化是生产经营性企业的追求目标。对于社会公共服务部门、科研单位等非赢利性机构而言,虽然利润最大化不是其追求目标,但由于规模效益问题涉及资源配置,资源使用效率等问题,所以至关重要。数据包络分析方法(DEA)是基于经济学概念上发展起来的非参数定量分析方法,可以提供决策单元规模效益变化的信息,因此可以用来对非赢利性公共部门或科研机构的分析。我们改进了标准DEA模型以克服其某些局限性,在实际应用中又与评估者的评估策略和价值导向相结合。作为应用实例,我们对某科研机构的34个研究组的规模效益进行了分析和探讨。  相似文献   

作品共被引分析与科学地图的绘制   总被引:14,自引:4,他引:14  
刘林青 《科学学研究》2005,23(2):155-159
科学计量学正努力制作“科学地图”为人们研究提供方向,共被引分析法(co citation)是其中关键技术之一。本文以战略管理研究领域为例,说明用共被引分析法,特别是作品共被引分析法来编制“科学地图”的基本操作。  相似文献   

A concept of end-user query language with facilities of expressing relationships between objects kept in a data base is presented. The idea of nesting these facilities in typical document system query language is shown. Special kinds of referring terms are designed. Examples of usage of the new facilities are attached.  相似文献   

In this paper, we propose a new algorithm, which incorporates the relationships of concept-based thesauri into the document categorization using the k-NN classifier (k-NN). k-NN is one of the most popular document categorization methods because it shows relatively good performance in spite of its simplicity. However, it significantly degrades precision when ambiguity arises, i.e., when there exist more than one candidate category to which a document can be assigned. To remedy the drawback, we employ concept-based thesauri in the categorization. Employing the thesaurus entails structuring categories into hierarchies, since their structure needs to be conformed to that of the thesaurus for capturing relationships between categories. By referencing various relationships in the thesaurus corresponding to the structured categories, k-NN can be prominently improved, removing the ambiguity. In this paper, we first perform the document categorization by using k-NN and then employ the relationships to reduce the ambiguity. Experimental results show that this method improves the precision of k-NN up to 13.86% without compromising its recall.  相似文献   

In the context of social media, users usually post relevant information corresponding to the contents of events mentioned in a Web document. This information posses two important values in that (i) it reflects the content of an event and (ii) it shares hidden topics with sentences in the main document. In this paper, we present a novel model to capture the nature of relationships between document sentences and post information (comments or tweets) in sharing hidden topics for summarization of Web documents by utilizing relevant post information. Unlike previous methods which are usually based on hand-crafted features, our approach ranks document sentences and user posts based on their importance to the topics. The sentence-user-post relation is formulated in a share topic matrix, which presents their mutual reinforcement support. Our proposed matrix co-factorization algorithm computes the score of each document sentence and user post and extracts the top ranked document sentences and comments (or tweets) as a summary. We apply the model to the task of summarization on three datasets in two languages, English and Vietnamese, of social context summarization and also on DUC 2004 (a standard corpus of the traditional summarization task). According to the experimental results, our model significantly outperforms the basic matrix factorization and achieves competitive ROUGE-scores with state-of-the-art methods.  相似文献   

A method of automatic document classification was developed as part of a larger research project in materials selection. Documents classed as QA by the Library of Congress classification system were clustered at six thresholds by keyword using the single link technique. The automatically generated clusters were then compared to the Library of Congress subclasses to which the documents had been assigned by human classifiers. Finally, a partial classified hierarchy was formed from the individual document clusters within a single threshold. Implications of the utility of grouping documents for on-line searching are discussed.  相似文献   

In a typical inverted-file full-text document retrieval system, the user submits queries consisting of strings of characters combined by various operators. The strings are looked up in a text-dictionary which lists, for each string, all the places in the database at which it occurs. It is desirable to allow the user to include in his query truncated terms such as X1, 1X, 1X1, or X1Y, where X and X are specified strings and 1 is a variable-length-don't-care character, that is, 1 represents an arbitrary, possibly empty, string. Processing these terms involves finding the set of all words in the dictionary that match these patterns. How to do this efficiently is a long-standing open problem in this domain.In this paper we present a uniform and efficient approach for processing all such query terms. The approach, based on a “permuted dictionary” and a corresponding set of access routines, requires essentially one disk access to obtain from the dictionary all the strings represented by a truncated term, with negligible computing time. It is thus well suited for on-line applications. Implementation is simple, and storage overhead is low: it can be made almost negligible by using some specially adapted compression techniques described in the paper.The basic approach is easily adaptable for slight variants, such as fixed (or bounded) length don't-care characters, or more complex pattern matching templates.  相似文献   

科技评估的效率原理与模型探讨   总被引:3,自引:0,他引:3  
盛承发 《科研管理》2003,24(1):40-43
针对我国科技评估中忽略成本因素引起的普遍严重问题 ,提出引进经济效率原理 ,阐明绩效与耗资的反比关系 ,并组建 5个数学模型。某机构或个人的绩效 =该机构或个人的产出赋分 耗资量 ,相对绩效 =绩效 本领域竞争者的平均绩效 ,综合绩效 =(相对绩效×相对产出规模 ) - 2 ,此模型可用于不同领域的绩效比较。  相似文献   

【目的】解决Indesign XML排版时单双栏混排文档中图像的自动排版问题。【方法】编写Java Script脚本,通过顺序读取标签、提取标签内容、应用样式的方法实现Indesign XML自动排版。【结果】在Indesign XML排版中应用自编的Java Script脚本可以实现单双栏混排文档中单栏图的自动排版。【结论】自编的Java Script程序能够在Indesign中实现以文字和单栏图为主的单双栏混排文档的自动排版与PDF文件导出,优化了排版流程。  相似文献   

We demonstrate effective new methods of document ranking based on lexical cohesive relationships between query terms. The proposed methods rely solely on the lexical relationships between original query terms, and do not involve query expansion or relevance feedback. Two types of lexical cohesive relationship information between query terms are used in document ranking: short-distance collocation relationship between query terms, and long-distance relationship, determined by the collocation of query terms with other words. The methods are evaluated on TREC corpora, and show improvements over baseline systems.  相似文献   

An imperfect document selection system is represented as the analogy of a system in which symbols are selected and transmitted through a noisy channel. Provided that transmission reception uncertainties and not meaning are considered, it is suggested that one of Shannon's equations is applicable, and a single figure measure of system efficiency, Ht, is proposed.Values obtained using this new yardstick are compared with Recall/Precision values obtained for a typical system. Further research is required to test whether system “improvements” resulting in higher values of Ht are perceived as such by users.  相似文献   

Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent presence of noise in such representation obviously degrades the performance of most of these approaches. In this paper we investigate an unsupervised dimensionality reduction technique for document clustering. This technique is based upon the assumption that terms co-occurring in the same context with the same frequencies are semantically related. On the basis of this assumption we first find term clusters using a classification version of the EM algorithm. Documents are then represented in the space of these term clusters and a multinomial mixture model (MM) is used to build document clusters. We empirically show on four document collections, Reuters-21578, Reuters RCV2-French, 20Newsgroups and WebKB, that this new text representation noticeably increases the performance of the MM model. By relating the proposed approach to the Probabilistic Latent Semantic Analysis (PLSA) model we further propose an extension of the latter in which an extra latent variable allows the model to co-cluster documents and terms simultaneously. We show on these four datasets that the proposed extended version of the PLSA model produces statistically significant improvements with respect to two clustering measures over all variants of the original PLSA and the MM models.  相似文献   

This paper describes an applied document filtering system embedded in an operational watch center that monitors disease outbreaks worldwide. At the initial time of this writing, the system effectively supported monitoring of 23 geographic regions by filtering documents in several thousand daily news sources in 11 different languages. This paper describes the filtering algorithm, statistical procedures for estimating Precision and Recall in an operational environment, summarizes operational performance data and suggests lessons learned for other applications of document filtering technology. Overall, these results are interpreted as supporting the general utility of document filtering and information retrieval technology and offers recommendations for future applications of this technology.  相似文献   

The retrieval effectiveness of the underlying document search component of an expert search engine can have an important impact on the effectiveness of the generated expert search results. In this large-scale study, we perform novel experiments in the context of the document search and expert search tasks of the TREC Enterprise track, to measure the influence that the performance of the document ranking has on the ranking of candidate experts. In particular, our experiments show that while the expert search system performance is related to the relevance of the retrieved documents, surprisingly, it is not always the case that increasing document search effectiveness causes an increase in expert search performance. Moreover, we simulate document rankings designed with expert search performance in mind and, through a failure analysis, show why even a perfect document ranking may not result in a perfect ranking of candidate experts.  相似文献   

为系统管理企业的电子图形文档,提高利用率和工作效率,提出了低成本建立企业电子图形文档管理系统的思路,对系统建立的环境、步骤、功能进行了分析,并在企业中进行了成功的应用.  相似文献   

The paper discusses the notion of steps in indexing and reveals that the document-centered approach to indexing is prevalent and argues that the document-centered approach is problematic because it blocks out context-dependent factors in the indexing process. A domain-centered approach to indexing is presented as an alternative and the paper discusses how this approach includes a broader range of analyses and how it requires a new set of actions from using this approach; analysis of the domain, users and indexers. The paper concludes that the two-step procedure to indexing is insufficient to explain the indexing process and suggests that the domain-centered approach offers a guide for indexers that can help them manage the complexity of indexing.  相似文献   

文敏 《中国科技期刊研究》2016,27(11):1151-1155
【目的】研究中文文献数据库中撤销论文的分布规律,以期发现国内期刊撤销论文的特点。【方法】 检索中国知网、维普、万方数据库,对撤销论文的撤销时间、撤销时滞、撤销原因、被引等情况进行分析。【结果】 共撤销论文211篇,其中医药卫生类论文128篇。撤销原因主要是非正常手段获取他人的研究成果、擅自发表他人的研究成果、抄袭剽窃、重复发表等,占68.2%(144/211),且医药卫生类、非医药卫生类论文撤销原因排序与所有学科一致;平均撤销时滞13.6个月;仍有144篇撤销论文可被检索到,其中63篇论文撤销后被引次数在1次以上。【结论】 撤销论文有较长的时滞性,论文撤销后未被数据库删除或标记,仍可被引用,期刊编辑部和数据库应重视对撤销论文的后续处理。  相似文献   

As digital libraries grow to global scale, the provision of interactive access to content in many languages will become increasingly important. In systems that support query-based searching, the presence of multilingual content will affect both the search technology itself and the user interface components that support query formulation, document selection and query refinement. This article describes the interactions among these components and presents a practical way of evaluating the adequacy of the selection interface. A categorization-based model for the user's selection process is presented and an experimental methodology suitable for obtaining process centered results in this context is developed. The methodology is applied to assess the adequacy of a selection interface in which multiple candidate translations for a term can be simultaneously presented. The results indicate that the modeled selection process is somewhat less effective when users are presented with multi-translation glosses from Japanese to English rather than materials generated originally in English, but that users with access to the gloss translations substantially outperform a Naive Bayes classification algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号