Web页面中文文本主题的自动提取研究 Research on Automatic Subject Extracting from Web Pages'''' Chinese Text期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Web页面中文文本主题的自动提取研究

引用本文：	韩客松,王永成,滕伟. Web页面中文文本主题的自动提取研究[J]. 情报学报, 2001, 20(2): 217-223

作者姓名：	韩客松王永成滕伟

作者单位：	上海交通大学,

基金项目：	国家 8 63计划资助!(合同号 :863 30 6 ZD0 3 0 4 1)

摘要：	Internet上的内容日益增多 ,搜索引擎返回的结果往往冗长。本文首先讨论Web页面文本与一般文本的四个不同点 ,然后介绍一种以统计方法为主、以匹配校验为辅的Web页面中文文本主题自动提取方法 ,它能帮助用户在最短时间内了解当前页面的主题。实验显示 ,所提取的前15个字串 ,反映主题的平均正确率在 85%以上 ,而处理时间仅为几十到几百毫秒。
关键词：	Web页面文本主题抽取　加权
修稿时间：	2000-05-30
Research on Automatic Subject Extracting from Web Pages'''' Chinese Text

Han Kesong,Wang Yongcheng,TENG Wei. Research on Automatic Subject Extracting from Web Pages'''' Chinese Text[J]. Journal of the China Society for Scientific andTechnical Information, 2001, 20(2): 217-223

Authors:	Han Kesong Wang Yongcheng TENG Wei

Abstract:	The information on the Internet is increasing quickly.Search engines always feed back long\|list of web sites and pages.In this paper,we firstly enumerate four differences between Web pages' text and the common texts,then introduce an automatic subject extracting method from Web pages' Chinese text,mainly based on a statistical method and assisted with match\|correcting.It can help the net users to master most of the subject of a Web page in the shortest time.The experiment results show that,the headed 15 strings in our result can reflect the subject with the precision of more than 85%,while the processing time ranges only tens to hundreds of milliseconds.

Keywords:	Web pages' text subject extracting weighting.
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏