首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structural documents with their own characteristics distinguished from general documents, these unique traits should be considered in the patent categorization process. In this paper, we categorize Japanese patent documents automatically, focusing on their characteristics: patents are structured by claims, purposes, effects, embodiments of the invention, and so on. We propose a patent document categorization method that uses the k-NN (k-Nearest Neighbour) approach. In order to retrieve similar documents from a training document set, some specific components to denote the so-called semantic elements, such as claim, purpose, and application field, are compared instead of the whole texts. Because those specific components are identified by various user-defined tags, first all of the components are clustered into several semantic elements. Such semantically clustered structural components are the basic features of patent categorization. We can achieve a 74% improvement of categorization performance over a baseline system that does not use the structural information of the patent.  相似文献   

2.
宋立荣  彭洁 《情报杂志》2012,31(2):12-18
随着政府信息资源共享建设的深入推进,信息质量(Information Quality,IQ)问题日渐凸现,已成为影响我国政府信息资源共享建设工作中一个突出制约因素,如何对共享信息资源的IQ进行全面、准确、科学的描述、度量和管理是深化政府信息资源共享工作的一个亟待解决的迫切问题。为此,在对美国联邦政府"信息质量法"的制定及实施的认识基础上,主要介绍美国政府"信息质量法("Information Quality Act,IQA)的制定的基本内容,首先介绍IQA产生的背景、制定历程、条款内容等。最后,基于美国政府IQA的认识,就我国政府信息资源共享建设工作提出几点对策建议。  相似文献   

3.
In this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term length. The weight scheme allows randomly setting a larger portion of the retrieved documents as relevance feedback, and lifts off the worry that very fewer relevant documents appear in top retrieved documents. It also helps to improve the performance of maximal marginal relevance (MMR) in document reranking. The method was evaluated by MAP (mean average precision), a recall-oriented measure. Significance tests showed that our method can get significant improvement against standard baselines, and outperform relevant methods consistently.  相似文献   

4.
5.
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the vocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both effectiveness and efficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) retrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, retrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.  相似文献   

6.
网络信息分类检索问题研究   总被引:4,自引:0,他引:4  
This paper studies network information classification retrieval from the theory of information management. With a brief introduction to search engines, it focuses on analyzing the characteristics of network documents and their classification system. Problems in network document classification are pointed out. Suggestions such as constructing a catalog classification search engine system are made.  相似文献   

7.
查询结果合并是分布式信息检索的重要步骤。本文依据选中信息集中文档重叠的程度以及信息集的同构、异构性,将查询结果的合并策略分3种情况进行分析:选中的信息集所含文档没有或有少量的重叠,选中的信息集同构,选中的信息集异构且所含文档有部分重叠。指出查询结果合并策略的深入研究,对于促进分布式检索技术的发展具有积极意义。  相似文献   

8.
The object of this paper is to present a new kind of approach to the problem of information system effectiveness evaluation as based on the theory of fuzzy sets. On the basis of this theory, the concepts of relevance and pertinence, which are the basic concepts used in determining the indices of information system effectiveness evaluation, have been defined. Assuming that in evaluating the effectiveness of information systems, one should consider separately the problem of quality evaluation of the transformation of the contents of documents and information requests into their search patterns and the problem of quality evaluation of the process of profile control of a document set of the information system, definitions have been given of parameters of quality evaluation of the transformation of the contents of documents and information requests into their search patterns with regard to a given information request as well as of parameters of quality evaluation of the process with regard to the whole set of information requests under examination. Besides, parameters of quality evaluation of the process of profile control of a document set of the information system have been defined. The parameters of effectiveness evaluation of information systems put forward in this paper take account of the fact that both evaluation of the relevance and evaluation of the pertinence of documents are of a continuous character.  相似文献   

9.
电子文献采访技术研究   总被引:4,自引:0,他引:4  
黄海 《情报科学》1999,17(6):708-710
电子文献是网络时代图书馆最重要、最基础的文献资源,如何采集高质量的电子文献是图书馆当今一项重大课题。本文针对电子文献采访过程中存在问题的综合分析,提出了相应的采访策略及其具体的操作方法。  相似文献   

10.
朱学芳  冯曦曦 《情报科学》2012,(7):1012-1015
通过对农业网页的HTML结构和特征研究,叙述基于文本内容的农业网页信息抽取和分类实验研究过程。实验中利用DOM结构对农业网页信息进行信息抽取和预处理,并根据文本的内容自动计算文本类别属性,得到特征词,通过总结样本文档的特征,对遇到的新文档进行自动分类。实验结果表明,本文信息提取的时间复杂度比较小、精确度高,提高了分类的正确率。  相似文献   

11.
针对目前军队院校原生文献数字信息资源种类繁多、形式多样、搜集困难、管理标准不统一等问题,从组织、人员、规范等方面提出了军校原生文献资源数字化建设的对策,使分散在教员和研究人员手中的原生数字信息资源最终实现开放存取,提高其学术价值,更好地为军校科研建设服务。  相似文献   

12.
浅谈高校档案管理信息化建设   总被引:12,自引:0,他引:12  
王艳玲 《现代情报》2007,27(6):63-64,67
高校档案信息化建设是高校“数字化校园”建设的重要组成部分,遥步实现高校档案资源数字化、信息服务网络化、电子文件和电子档案管理规范化,为教学、科研、管理服务,为高校的建设和发展提供优质、快捷的档案信息,是高校档案工作发展的目标。  相似文献   

13.
刘艳平 《情报杂志》1991,10(3):45-49
通过分析陕西省农业文献资源与利用现状,指出文献利用率不高及目前影响文献开发利用的障碍因素,提出了下述几方面开发利用文献资源建设的意见:(1)强化领导及科研人员的情报意识;(2)建立省农业情报资源中心,形成省、地(市)、县三级情报网,实现全省农业文献资源共享;(3)开发成果文献,加速成果转化;(4)加强农业情报工作;(5)建立有地方特色的农业专题文献数据库;(6)加强对现有情报人才的培训。  相似文献   

14.
信息系统绩效全过程评价体系研究   总被引:1,自引:0,他引:1  
郝晓玲  肖薇薇 《情报科学》2006,24(8):1223-1227
本文在研究综述的基础上,分析了现有信息系统评价领域存在的职能驱动性、静态性、片面性、事后性等问题,并针对这些问题提出了基于平衡计分卡的信息系统绩效全过程评价体系,论述了该评价体系的特点,并结合实际案例,探讨了该指标体系的应用过程。  相似文献   

15.
文本内容新颖性探测研究综述   总被引:2,自引:0,他引:2  
如何为用户提供及时有用的新颖信息是一个亟待解决的研究内容。试图对文本内容新颖性探测的研究方法做一个梳理,从文本内容新颖性探测的研究起源、应用于这一研究的文本表示方法、相似性对比的方法以及内容新颖性探测过程等方面进行分析,以期对文本内容新颖性探测的研究有一个较全面的把握。  相似文献   

16.
樊志伟 《情报杂志》2012,31(5):150-154
我国正处在农村信息化快速发展的时期,农民信息需求的满足程度是我国农村信息化发展的决定性评判标准。在调查统计近十年来国内外有关我国农民信息需求研究文献的基础上,从我国农民信息需求研究状况、研究热点两个方面,对我国农民信息需求状况调查、特征及影响因素等方面的研究进行了综述,并对我国农民信息需求研究的发展趋势进行了展望。  相似文献   

17.
公文格式是在公文制发过程中逐渐形成的,它体现了国家行政公文的特点与权威性,本文从公文的眉首、主体、版记三个部分来分析公文格式中存在的"常见病",从而使公文格式更能准确地表达发文机关的发文意图。  相似文献   

18.
张铭志 《情报探索》2020,(3):111-116
[目的/意义]旨在为情报共享研究提供新的方向。[方法/过程]检索CNKI中国知网数据库收录的我国有关情报共享的相关论文,利用论文相关信息的统计分析工具Bibexcel建立矩阵,并通过社会化网络分析软件Ucinet以及NetDraw绘出图谱,针对关键词和论文来源的子网、密度、中心性、小团体进行分析。揭示了国内情报共享领域的关键词、作者合著、论文来源等社会网络关系,总结了我国情报共享研究的发展趋势和特点等。[结果/结论]指出我国情报共享研究发展路径包括加强学术交流,形成更加系统的研究体系;从不同的角度进行情报共享研究;深入军民不同领域间的情报共享研究。提出军事领域内的情报和社科领域内的情报进行共享是未来情报共享研究的新方向。  相似文献   

19.
郜正亚  刘晓荣 《情报杂志》2012,31(2):62-66,51
随着信息化进程的不断加快,信息服务与信息需求的矛盾日益突出,情报学得到了迅速的发展。根据2011年国家社会科学基金项目课题指南和2011年国家社会科学基金项目评审结果的对比分析,从理论研究热点、方法研究热点、业务研究热点、技术研究热点四个方面探索了当前我国情报学研究的热点领域,并对图书情报学综合研究热点进行了研究,同时对未来研究趋向予以适度预测,以期为相关研究提供一种现实可行的思路和手段。  相似文献   

20.
Ontologies are frequently used in information retrieval being their main applications the expansion of queries, semantic indexing of documents and the organization of search results. Ontologies provide lexical items, allow conceptual normalization and provide different types of relations. However, the optimization of an ontology to perform information retrieval tasks is still unclear. In this paper, we use an ontology query model to analyze the usefulness of ontologies in effectively performing document searches. Moreover, we propose an algorithm to refine ontologies for information retrieval tasks with preliminary positive results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号