首页 | 本学科首页   官方微博 | 高级检索  
     检索      

An improved TF-IDF approach for text classification
作者姓名:张云涛  龚玲  王永成
作者单位:[1]Network&InformationCenter,ShanghaiJiaotongUniversity,Shanghai200030,China//SchoolofElectronic&InformationTechnologyShanghaiJiaotongUniversity,Shanghai200030,China [2]SchoolofElectronic&InformationTechnologyShanghaiJiaotongUniversity,Shanghai200030,China
摘    要:This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence, recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves …

关 键 词:文本处理  文本分析  TF-IDF  自动化  词语分级  出现频率

An improved TF-IDF approach for text classification
Zhang Yun-tao,Gong Ling,Wang Yong-cheng.An improved TF-IDF approach for text classification[J].Journal of Zhejiang University Science,2005,6(1):49-55.
Authors:Zhang Yun-tao  Gong Ling  Wang Yong-cheng
Institution:(1) Network & Information Center, Shanghai Jiaotong University, 200030 Shanghai, China;(2) School of Electronic & Information Technology, Shanghai Jiaotong University, 20030 Shanghai, China
Abstract:This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence,recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves the precision and recall of text classification compared with the conventional TF-IDF approach.
Keywords:Term frequency/inverse document frequency (TF-IDF)  Text classification  Confidence  Support  Characteristicwords
本文献已被 维普 万方数据 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号