首页 | 本学科首页   官方微博 | 高级检索  
     

文本分类C#实现*
引用本文:刘华. 文本分类C#实现*[J]. 现代图书情报技术, 2007, 2(3): 43-45
作者姓名:刘华
作者单位:暨南大学华文学院/海外华语研究中心,广州,510610
基金项目:教育部国家语言资源监测项目
摘    要:设计并实现一个基于向量空间模型和简单贝叶斯的文本分类系统,系统采用层级多标签的分类策略。详细介绍词语切分统计、终分类器值计算、层级小类校正和兼类判断四个子系统模块。基于向量空间模型分类的第一级大类和层级小类的微平均分别为89.7%和77.8%,简单贝叶斯分别为67.6%和66.5%。

关 键 词:向量空间模型  简单贝叶斯
收稿时间:2007-01-29
修稿时间:2007-01-27

A Text Categorization System with C#
Liu Hua. A Text Categorization System with C#[J]. New Technology of Library and Information Service, 2007, 2(3): 43-45
Authors:Liu Hua
Affiliation:College of Chinese Language and Culture/Center for Overseas Huayu Research, Jinan University, Guangzhou 510610, China
Abstract:Based on Vector Space Model(VSM) and Nave-Bayes(NB), completed a multilayer and multi-classification text categorization system. Introduce detailedly four modules: words’ segmentation and frequency statistics, calculating between classifications’ and document, emendating the veracity of parent-class by emendation of subclass, judging whether document has multi-classification and multi-label. Text representation based on Vector Space Model has 89.7% MicroF1 of parent- category, 77.8% of sub- category; text representation based on Nave-Bayes has 67.6% MicroF1 of parent- category, 66.5% of sub- category.
Keywords:Text categorization Vector space model Na-Bayes
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《现代图书情报技术》浏览原始摘要信息
点击此处可从《现代图书情报技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号