基于EM算法的汉语自动分词方法 Segmenting Chinese by EM Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于EM算法的汉语自动分词方法

引用本文：	李家福,张亚非.基于EM算法的汉语自动分词方法[J].情报学报,2002,21(3):269-272.

作者姓名：	李家福张亚非

作者单位：	1. 解放军理工大学通信工程学院,南京,210016 2. 解放军理工大学理学院,南京,210016

基金项目：	国家自然科学基金项目 (编号 6 9975 0 2 4)，国家自然科学基金重点项目 (编号 6 9931040 )资助

摘要：	汉语自动分词是中文信息处理中的基础课题。本文首先对汉语分词的基本概念与应用 ,以及汉语分词的基本方法进行了概述。接着引出一种根据词的出现概率、基于极大似然原则构建的汉语自动分词的零阶马尔可夫模型 ,并重点剖析了EM(Expectation Maximization)算法 ,对实验结果进行了分析。最后对算法进行了总结与讨论。
关键词：	分词汉语 EM算法语料库 HMM
修稿时间：	2001年5月18日
Segmenting Chinese by EM Algorithm

Li Jiafu.Segmenting Chinese by EM Algorithm[J].Journal of the China Society for Scientific andTechnical Information,2002,21(3):269-272.

Authors:	Li Jiafu

Abstract:	Word segmentation is a basic task of Chinese information processing.In this paper we present a simple probabilistic model of Chinese text based on the occurrence probability of the words,which can be seen as a zero-th order hidden Markov Model(HMM).Then we investigate how to discover by EM Algorithm the words and their probabilities from a corpus of unsegmented text without using a dictionary.The last part is conclusion and discussion about the algorithm.

Keywords:	word segmentation EM Algorithm corpus HMM
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏