基于机器学习的网页文本抽取技术 Web text extraction technology based on machine learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于机器学习的网页文本抽取技术

引用本文：	程娟.基于机器学习的网页文本抽取技术[J].图书馆学研究,2008(5):21-22.

作者姓名：	程娟

作者单位：	程娟（江汉大学文理学院图书馆）

摘要：	本文主要研究了从不同类型的html页面中根据需要抽取指定文本的技术.首先分析了目前主流的文本抽取技术的优点及缺点,并针对传统文本抽取技术的不足提出了基于机器学习的网页文本抽取技术;然后重点分析了此技术的实现原理,并在最后以案例方式介绍了使用java语言构建基于此技术的文本抽取系统.
关键词：	文本抽取文本密度机器学习神经网络 java 机器学习网页文本系统语言 java 使用案例原理重点传统文本抽取分析取指 html 类型研究
Web text extraction technology based on machine learning

Cheng Juan.Web text extraction technology based on machine learning[J].Researches in Library Science,2008(5):21-22.

Authors:	Cheng Juan

Abstract:	This paper studies on the technology extracting giving text on demand from different html pages. The paper first analyzes the merits and flaws of current text extracting technology used most widely, and brings up the web text extraction technology based on machine learning based on the traditional theory; secondly, it analyzes the principle of realization of the technology; at last, it introduces an example of constructing the text extracting system based using java.

Keywords:	text extraction text density machine learning neural networks java
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏