基于字位信息的中文分词方法研究* The Research of Character-Position-Based Chinese Word Segmentation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于字位信息的中文分词方法研究*

引用本文：	张金柱,张东,王惠临.基于字位信息的中文分词方法研究*[J].现代图书情报技术,2008,24(5):39-43.

作者姓名：	张金柱张东王惠临

作者单位：	中国科学技术信息研究所,北京,100038

基金项目：	中国科学技术信息研究所学科建设项目，国家科技支撑计划

摘要：	分析中文自动分词的现状，介绍和描述几种不同的分词思想和方法，提出一种基于字位的分词方法。此分词方法以字为最小单位，根据字的概率分布得到组合成词的概率分布，因此在未登录词识别方面比其它方法有更优秀的表现。使用最大熵的机器学习方法来进行实现并通过两个实验得出实验结果的比较分析。
关键词：	中文分词字位最大熵未登录词识别
收稿时间：	2007-12-28
修稿时间：	2008-01-21
The Research of Character-Position-Based Chinese Word Segmentation

Zhang Jinzhu,Zhang Dong,Wang Huilin.The Research of Character-Position-Based Chinese Word Segmentation[J].New Technology of Library and Information Service,2008,24(5):39-43.

Authors:	Zhang Jinzhu Zhang Dong Wang Huilin

Institution:	(Institute of Scientific and Technical Information of China， Beijing 100038，China)

Abstract:	This paper analyses the actuality and introduces several different representative approaches of Chinese word segmentation,then brings out a character-position-based segmentation method which takes the Chinese character as the least unit.It indicates the probability distribution of a word through the probability distribution of Chinese character,so it plays much better than other approaches in unknown word recognition.This idea takes a machine-learning method called maximum entropy for implementation and two experiments for comparing and analyzing the results.

Keywords:	Chinese word segmentation Character-position Maximum entropy Unknown word recognition
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《现代图书情报技术》浏览原始摘要信息
	点击此处可从《现代图书情报技术》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏