首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于词频的中文文本分类研究
引用本文:姚兴山.基于词频的中文文本分类研究[J].现代情报,2009,29(2):179-181.
作者姓名:姚兴山
作者单位:南京大学信息管理系, 江苏南京 210093
摘    要:本文对中文文本分类系统的设计和实现进行了阐述,对分类系统的系统结构、特征提取、训练算法、分类算法等进行了详细的介绍。将基于词频统计的方法应用于文本分类。并提出了一种基于汉语中单字词及二字词统计特性的中文文本分类方法,在无词表的情况下,通过统计构造单字和二字词表,对文本进行分类,并取得不错的效果。

关 键 词:词频统计  特征选取  中文文本分类

Chinese Text Classification Based on Word Frequency Statistics
Authors:Yao Xingshan
Institution:Department of Information Management, Nanjing University, Nanjing 210093, China
Abstract:In this paper, the designation and accomplishment of a Chinese text classification system was described, and system construction, feature selection, training arithmetic, classification arithmetic were introduced. The methods based on word frequency statistics were used in Chinese text classification. At the same time, a new Chinese text classification method was introduced in this paper, which based on word and two- word statistical properties. In the ~sence of vocabulary, statistics through word structure and the second word list, text classification, and achieved good results.
Keywords:word frequency statistics  feature selection  chinese text classification
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《现代情报》浏览原始摘要信息
点击此处可从《现代情报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号