大规模汉语语料库中任意n的n-gram统计算法及知识获取方法 Algorithm of n gram Statistics for Arbitrary n and Knowledge Acquisition Based on Statistics期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

大规模汉语语料库中任意n的n-gram统计算法及知识获取方法

引用本文：	张民,李生,赵铁军.大规模汉语语料库中任意n的n-gram统计算法及知识获取方法[J].情报学报,1997(1).

作者姓名：	张民李生赵铁军

作者单位：	哈尔滨工业大学计算机科学与工程系

摘要：	本文提出并实现了一种大规模汉语语料库中字、词级任意ｎ的ｎ－ｇｒａｍ统计算法，本算法可以一次性统计出所有不大于任意ｎ（本文ｎ取为２５６）的字、词级ｎ－ｇｒａｍ，可将传统ｎ－ｇｒａｍ统计时的指数空间开销变为线性的，且与所统计的元数无关。基于这种ｎ－ｇｒａｍ的统计，本文还进行了汉语信息熵的计算及字、词级知识获取的研究。本算法及本文的研究结果已应用于我们研制的机译系统中
关键词：	n元语法统计信息熵知识获取
Algorithm of n gram Statistics for Arbitrary n and Knowledge Acquisition Based on Statistics

Zhang Min,Li Sheng and Zhao Tiejun.Algorithm of n gram Statistics for Arbitrary n and Knowledge Acquisition Based on Statistics[J].Journal of the China Society for Scientific andTechnical Information,1997(1).

Authors:	Zhang Min Li Sheng and Zhao Tiejun

Abstract:	A new algorithm of n gram statistics for arbitrary n at word or phrase level is proposed and realized in this paper,with which the n gram for all n at word or phrase level can be calculated at the same time. Based on the n gram,the Chinese information entropy and knowledge acquisition at word or phrase level have also been studied.The algorithm and its result have been integrated with a MT system.

Keywords:	n gram statistics information entropy knowledge acquisition
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏