首页 | 本学科首页   官方微博 | 高级检索  
     检索      

中文专利文献自动分类
引用本文:陈志雄,曾辉.中文专利文献自动分类[J].嘉应学院学报,2010,28(2):24-29.
作者姓名:陈志雄  曾辉
作者单位:嘉应学院,电子信息工程学院,广东,梅州,514015
基金项目:广东省知识产权局软科学研究计划项目,梅州市科学研究项目 
摘    要:采用KNN算法实现了一种中文专利文献自动分类系统。针对专利文献数据规模过大,分类效率低下的问题,采用修剪样本技术删除冗余样本,提高了分类器的效率。为解决修剪样本导致干扰文献积累对KNN分类性能下降的影响,系统使用信息增益对专利文献进行特征词选择,削弱了干扰文献对KNN分类的作用。实验证明,采用修剪样本技术和基于信息增益的特征词选择能有效缩小训练集规模,提高KNN分类准确率。

关 键 词:专利文献  KNN分类  修剪样本  信息增益

Chinese Patent Text Automatic Categorization System
CHEN Zhi-xiong,ZENG Hui.Chinese Patent Text Automatic Categorization System[J].Journal of Jiaying University,2010,28(2):24-29.
Authors:CHEN Zhi-xiong  ZENG Hui
Institution:( School of Electronic and Information Engineering, Jiaying University, Meizhou 514015, China)
Abstract:A Chinese patent texts automatic classification system based on KNN is implemented. Focus on the inef- ficient categorization, caused a huge number of patent texts, present the techniques of pruning redundant exemplars in order to improve the efficiency of classifier. In order to solve the performance degradation of KNN classification caused pruning exemplars lead to the accumulation of noisy exemplars, information gain is used to select the feature of patent texts and weaken the impact of the accumulation of noisy exemplars. The experiment result show that using the techniques of pruning exemplars can effectively reduce the size of the training set, and based on information gain of feature selection can improve KNN classification accuracy.
Keywords:patent  KNN  pruning exemplars  information gain
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号