一种基于语义的中文文本分类算法 A Semantics-based Chinese Text Classification Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种基于语义的中文文本分类算法

引用本文：	赵辉,刘怀亮,范云杰,左晓飞.一种基于语义的中文文本分类算法[J].情报理论与实践,2012,35(3):115-118.

作者姓名：	赵辉刘怀亮范云杰左晓飞

作者单位：	西安电子科技大学经济管理学院,陕西西安,710071

摘要：	针对向量空间模型中语义缺失问题，将语义词典（知网）应用到文本分类的过程中以提高文本分类的准确度。对于中文文本中的一词多义现象，提出改进的词汇语义相似度计算方法，通过词义排歧选取义项进行词语的相似度计算，将相似度大于阈值的词语进行聚类，对文本特征向量进行降维，给出基于语义的文本分类算法，并对该算法进行实验分析。结果表明，该算法可有效提高中文文本分类效果。
关键词：	文本分类语义向量空间向量空间模型语义相似度算法
A Semantics-based Chinese Text Classification Algorithm

Institution:	Zhao Hui et al.

Abstract:	To solve the problem of semantic deficiency in Vector Space Model,the semantic dictionary(HowNet) is used for text classification to improve its accuracy.For the polysemous phenomenon in Chinese text,this paper proposes an improved word semantic similarity computation method,which computes word similarity by selecting semantic item through word sense disambiguation,then clusters the words whose similarities are above the target threshold,and reduces the dimensions of text characteristic vector.The semantics-based text classification algorithm is proposed,and an experimental analysis is given to the algorithm.The results show that the algorithm can improve the effectiveness of Chinese text classification effectively.

Keywords:	text classification semantic vector space VSM semantic similarity algorithm
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏