一种从WEB上抽取信息的方法 A Method of Extracting Information from Web期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种从WEB上抽取信息的方法

引用本文：	韩立新,谢立. 一种从WEB上抽取信息的方法[J]. 情报学报, 2004, 23(1): 45-51

作者姓名：	韩立新谢立

作者单位：	河海大学计算机及信息工程学院,南京,210024;南京大学计算机软件新技术国家重点实验室,南京,210093

基金项目：	河海大学科技创新基金资助项目 (编号 :2 0 0 2 4 0 7343)，南京大学计算机软件新技术国家重点实验室开放课题基金资助项目 (编号A2 0 0 30 8)

摘要：	由于WWW上的信息很多存储在HTML页面上 ,因此如何从HTML文档中抽取有用信息是一个迫切需要解决的问题。文中提出一种从HTML文档中抽取信息的方法。该方法综合运用关联规则法、模式匹配、语法规则、聚类法等技术来抽取信息 ,从而较好地解决了现有的抽取方法准确性较差、通用性较差、人工干预较多的问题。
关键词：	Web文档信息抽取关联规则法模式匹配语法规则聚类法
修稿时间：	2003-04-04
A Method of Extracting Information from Web

Han Lixin. A Method of Extracting Information from Web[J]. Journal of the China Society for Scientific andTechnical Information, 2004, 23(1): 45-51

Authors:	Han Lixin

Abstract:	Nowadays, large amount of information on the WWW is stored as HTML documents. The ability of facilitating users to extract useful information from HTML documents is more and more important. In this paper, we propose a method of extracting information from HTML documents. The method uses some technologies such as association rules,pattern matching,grammars,clustering. The method has more accurate,better applicability and less manual interference.

Keywords:	Web documents extracting information association rules pattern matching grammars clustering.
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏